程序代写代做代考 algorithm Algorithms and Data 19: Working Around 


Algorithms and Data 19: Working Around 

NP-Completeness Professor Kevin Gold

Two Approaches to Dealing With NP-Complete Problems
1. Special cases.
 

Maybe you never really see the toughest cases
 and your worst case need not be exponential.
2. Approximation algorithms.
 

If a close solution is good enough — as in, within a constant factor of optimal — sometimes you can do that in polynomial time.


2-SAT is in P
With just 2 variables in each clause, can think of each clause two

implications
(x v y) becomes ¬x ⇒ y and ¬y ⇒ x
Can create a graph with nodes corresponding to variables and

their negations, and edges corresponding to the implications.
The formula is unsatisfiable iff there is both a path from v to ¬v and

a path from ¬v to v for some variable v
“If x, then not x. If not x, then x.” Contradiction. •

So a linear-time search per variable solves this problem.

2-SAT Graph Example (x v y) ^ (¬x v ¬y) ^ (¬x v y) ^ (x v ¬y)
Notice paths from x to ¬x and back and y to ¬y and back…
 so each variable setting implies a contradiction, and this formula is not satisfiable.
x
y
¬x
¬y

Independent Set on Trees
Dynamic programming can find independent sets on trees in polynomial time. Begin at leaves of the tree, which get a score of 1.
Score of a parent is max {sum of children, 1 + sum of grandchildren}
Can work backwards at end to determine what the set is.
8
3
2
2
• • • •
5
22 11111
1

Why Is Independent Set on
Trees Easier?
What allows the dynamic programming approach to work here is an

ability to “commit” to the clearly best solution for a node and move on.
As soon as we add any cycles to the graph, it’s no longer clear when

we can commit to an optimal small solution.
5?
8?
3?
2?
2?
2?
1?111? 1?
2?
1?

Approximation Algorithms
Approximation algorithms attempt to produce

solutions within some constant factor of optimal.
They are inherently solving the “optimization

problems” associated with NP-complete problems.
Independent Set decision problem: Is there an

independent set of size k in this graph?
Independent Set optimization problem: Find the

largest independent set on the graph.

Minimum Vertex Cover
Approximation Algorithm
The 2-approximation algorithm: while there are edges not adjacent to marked vertices, pick such an edge and put both vertices in the cover
Vertex Cover optimization problem: Find the smallest

set of vertices that touches every graph edge.


Minimum Vertex Cover
Approximation Algorithm
None of the edges chosen share a vertex, so the optimal solution must contain at least as many vertices as there were edges chosen
This is surprisingly close to the best known approximation (and was best for some time)
We picked twice as many vertices

as that, so we are <= 2OPT. • Sometimes straightforward • strategies yield respectable bounds Approximation Algorithms and Reductions Don’t Mix The independent set is all the nodes not in the cover. Recall that we know finding a vertex cover of size k lets us find an • independent set of size n-k. • So, can we use a 2-approximation algorithm for vertex cover as a The reduction “works” but isn’t necessarily close to optimal. • good approximation algorithm for independent set? • Suppose the optimal vertex cover and the optimal independent • set are both of size |V|/2. Returning 2OPT nodes returns all the vertices, and an empty • independent set. That’s not within any good factor of optimal. Traveling Salesman Problem Traveling Saleman Decision Problem (NP-complete): Given a weighted graph, is there a simple cycle that visits all nodes and has weight <= D? Exercise: give a reduction from Hamiltonian Cycle • • Traveling Salesman Optimization Problem: find the • shortest simple cycle that visits all vertices. • This is actually solvable with a polynomial number of calls to the decision version, as with other
 NP-complete problems. Euclidean Traveling Salesman Problem (Euclidean distance is the distance metric you’re used to, sqrt(Δx2 + Δy2) These points therefore obey the triangle inequality: going from a to b to c is never On graphs like this, we can get within a factor of 2 of the optimal in polynomial time. The Euclidean Traveling Salesman Problem looks for short tours on graphs that • represent Euclidean distances between points in the plane • • faster than going directly a to c. • 3 1 2 triangle inequality holds 3 2 6 triangle inequality doesn’t hold Euclidean Traveling Salesman Approximation 1. Calculate the minimum spanning tree for the graph, using Kruskal’s or Prim’s algorithms. 2. Create a pseudo-cycle that follows every edge twice around the minimum spanning tree. 3. “Relax” the tour by skipping vertices already touched and heading to the next vertex. http://www.cs.yale.edu/homes/aspnes/pinewiki/ApproximationAlgorithms.html Euclidean Traveling Salesman Approximation The optimal tour must be larger than the optimal • spanning tree, because it contains a spanning tree. • Because of the triangle inequality (shortcuts are always smaller than two-step paths), our tour is no larger than twice the minimal spanning tree size. So our tour is 2-optimal. • • Given a set of jobs with running times t1, t2, ...tn and k machines that can run one job at a time, is there a way to schedule all jobs to finish before deadline D? Machine Scheduling Even the two-machine case is NP-complete by elements summing to target T T T force it to balance Subset Sum Reduction • reduction from Subset-Sum. Let S be the sum of all the elements in the subset • sum and T be the target to be summed to. Create jobs with times equal to the set of numbers • to be summed, plus one additional job that takes time 2T-S leftovers This set of jobs can be finished before deadline T iff • there is a subset of the original elements summing to T. S-T 2T-S • An Optimization Version and a Greedy Algorithm A greedy algorithm:
 Initialize machine finishing times Fj to 0
 For i = 1 to N
 Assign job i to the machine Mj with the minimum
 finishing time
 Update finishing time of Mj: Fj += ti An optimization version of machine scheduling is to • achieve the earliest possible time by which all N jobs finish. M1 M2 6 4 354 Greedily Scheduling 6, 3, 5, 4, 4 (completion time 12) Thinking About How Close This Could Be To Optimal 111111 6 Optimal 111 6 111 Greedy (6 last) The greedy algorithm behaves reasonably until it gets a big element that it • should have “planned for” If the algorithm continues, it’ll reduce this imbalance. And if the big element is too big, even the optimal can’t do anything about it. • • • Lower bound #1: The optimal finishing time is at least S/k, where S is the sum of all job times and k is the number of machines. Two Lower Bounds To figure out how close to optimal the approximation • algorithm is, we need some lower bounds on performance. There must be some machine that does at least this • much work, or all the work wouldn’t get done. Lower bound #2: The optimal finishing time is at least as • big as the biggest job time. • Proving the Greedy Assignment is 2-Optimal It was still the least loaded before we added the final job to it (because of our greedy strategy) — meaning if the last job took time tj, the other machines were all already loaded to at least 
 Ti-tj. Consider the machine Mi that works the longest in our greedy • algorithm’s schedule, with load Ti. So the sum S of all loads across all k machines was
 • at least k(Ti - tj). So (Ti - tj) <= S/k. • • That was our lower bound #1 on OPT. (Ti - tj) <= S/k <= OPT. And tj <= OPT by lower bound #2. So Ti <= 2OPT. (QED) Thinking About the Worst Case Can Lead to an Improvement If the bad stuff happens when a big job comes last, • why not balance the big jobs first? • Approximation Algorithm #2: Same as the first, but we first sort the jobs from largest to smallest time, scheduling large jobs first. Still greedily schedule to the least-used machine. Now, when we think about the time taken by the final job, that will turn out to be smaller than OPT — we can show it’s no bigger than 1/2 OPT. • • Sorted Greedy Machine Scheduling Is Within 3/2 of OPT If we have fewer jobs than number of machines k, we schedule one • job per machine and we’re optimal. So assume more than k jobs for the rest of this Optimal deadline is at least 2tk+1 where tk+1 is the (k+1)th biggest • • job. Some machine must be working two of the first k+1 jobs, and • those jobs are all at least as big as tk+1 So the proof of 3/2-optimality goes like the 2-optimality argument, • only the final element tj <= tk+1 <= 1/2 OPT. (it’s sorted so tj <= tk+1) • So (Ti - tj) <= S/k <= OPT here implies Ti <= 3/2 OPT. (QED) • Set Cover E.g. U={1, 2, 3, 4}, S1 = {1, 2}, S2 = {2, 3}, S3 = {3, 4}: 
 NP-completeness can be shown reducing from Vertex Cover. Set Cover is the NP-complete problem: given a set U and subsets of U S1 • ... Sn, are there k of them whose union forms a target set U? • YES for k=2, NO for k=1 If we had a Set Cover solver, let U be a set with one element eij for each • edge in the graph, and let set Si contain the edges eij that are adjacent to vertex i. If there is a set cover, the vertices corresponding to the chosen sets are Also in NP because clearly we can check the union in poly time • a vertex cover. • • A greedy algorithm is the following:
 R = U
 while |R| > 0 // uncovered elements remain

add set Si that minimizes the 
 cost-per-uncovered-element wi/|Si ∩ R|

R = R – Si

Return the selected sets
Weighted Set Cover
Suppose there is a cost wi associated with each set, and we want to minimize

the total cost of covering the set.
Metaphor/application: You’re buying compilations of games that are

awkwardly bundled, and want to spend the minimum to purchase them all.
The argument for being close to optimal will be that element-by-element, we

“paid a good price” compared to optimal


Claim: Greedy weighted set cover spends at most log(M) times the optimal, where M is the maximum set size.
This proof will basically work by showing for every set with cost wi, our algorithm paid no more than log(M)wi to cover all the elements in that set

A Bound on Greedy Weighted Set Cover
So whatever sets OPT picked, our algorithm paid

within a factor of log M of those costs wi

A Bound on Greedy Weighted Set Cover
Let the cost paid for element s, cs, be the weight wi of the set that covered it,

divided by the number of uncovered elements at the time.
E.g., if wi = 4, Si = {a, b, c}, and b already covered, then cs is 2 for a and c. The total amount spent by the algorithm is the sum of these per-element costs


cs — summed over each element in the final set, they are equal to the Σwi paid.
For any given set Sk, order the elements 1…d in the order they were actually

added to our cover.
The cost actually paid for element j is no more than wk/(d-j+1): the weight of

Sk divided among the remaining elements in the set.
The cost may have been less than this if we actually “bought” the element

with a different set, but it’s no more than this.


The cost actually paid for element j of set Sk is no more than wk/(d-j+1): the weight of Sk divided among the remaining elements in the set. (d = |Sk|)
A Bound on Greedy
Weighted Set Cover
So the total cost of Sk is ≤ Σj wk/(d-j+1)

The sum in parens is the harmonic sum 1/i up to d, which

= wk (1/d + 1/(d-1) + … + 1)

is Θ (log d) — so the cost is Θ (wk log |Sk|)

So for each set Si* in the optimal cover, we paid no more than log(|Si|)wi for it, or less than Σ log(|Si|)wi ≤ (log M) OPT overall



Some Hard Problems are
Approximable, Others Not
Though NP-complete problems are “equally hard” from the point of view of getting exact answers, they differ greatly in how approximable they are
For some, any approximation would itself be NP-hard
Sometimes it’s possible to get arbitrarily close to optimal in

polynomial time on NP-complete problems!
For others, we can get within a factor of 1+ε of optimal for

arbitrary ε
Problems with pseudo-polynomial algorithms are often easily

approximated, because we can employ rounding

Recall Our Original
Knapsack Algorithm
Suppose weight limit W = 4 Items:
i=1: Sandwich, wi = 1, vi = 5 i=2: TV, wi = 4, vi = 10
i=3: Water bottle, wi = 2, vi = 7
Items i
0
0
0
0
5
5
0
5
5
0
5
5
Weight Limit w
0123
00 15 27 3 12 4 0 5 10 12

A Slightly Different Knapsack Algorithm
If we can round the numbers involved in Knapsack, we can make much

smaller tables that only have entries for the possible rounded sums.
E.g., if numbers are 238947, 106917, 369400 … round to 300000,

200000, 400000.
Instead of millions of table entries, table entries for just 100000,

200000, 300000, …

Or we could just record these as 1, 2, 3 …
But we probably shouldn’t round weights, since that could lead to

impossible solutions.
So we’ll round values and redesign the algorithm to make a table of

values.


Before we start, we can round the values to ceiling(vi/b) for some b.
A Slightly Different Knapsack Algorithm
Key idea of the dynamic programming recurrence: bestWeight[i,v]

will include the smallest weight that achieves value v using only the first i elements.
Either the current item isn’t in the optimal solution, and

bestWeight[i,v] = bestWeight[i-1,v]…
• • •
…or it is, and bestWeight[i,v] = bestWeight[i-1,v-vi] + wi Take the min of these for each table entry.
In the end, find the biggest V with weight that doesn’t go over the limit.

A Slightly Different
Knapsack Algorithm
Suppose weight limit W = 4, rounding factor b= 4 Items:
i=1: Sandwich, wi = 1, vi = 5/4 ≈ 2
i=2: TV, wi = 4, vi = 10/4 ≈ 3
i=3: Water bottle, wi = 2, vi = 7/4 ≈ 2
Items i
0123
00 11 21
Value 33
0



v/b
4 55 67 77


0
1
1




0
1
1
4
5
5
3
Largest value under weight limit

A Slightly Different Knapsack Algorithm
For i = 0 to N
 w[i,0] = 0

For i
= 1 to N

for v = 1 to (sum achievable by vi so far)

if v > (sum of v up to vi-1)

w[i, v] = wi + w[i-1,v]
 else

w[i,v] = min(w[i-1,v], wi + w[i-1, max(0, v – vi))
 Return maximum v such that w[i,v] <= weight limit W • • • • A Polynomial Time (1+ε) Approximation To get a Knapsack solution within a factor of (1+ε) of optimal, in • polynomial time: Let b = (ε/(2N)) * maxi vi, which sets the maximum item value to 2N/ε. This will make our running time no longer dependent on values vi. Use the preceding Knapsack algorithm with values vi/b Return the set of items maximizing the value. The running time is at most proportional to the entries in the table, 
 This is polynomial if we consider ε to be a constant. • (N items * sum of all values) ≤ N * N * (2N/ε)= ε2N3. • • • Let SOPT be the optimal sum of values, let S’ be the sum of our “rounded” values (divide by b, ceiling, multiply by b), and let S be our real (not-rounded) sum. S’ must be at least as good as what rounding applied to the values in SOPT would have produced (call this SOPT’); we know the algorithm chooses the best “rounded” result. Proving the (1+ε) Approximation And S’ is no larger than b added to each real value, or 
 • S + Nb. Thus SOPT ≤ SOPT’ ≤ S’ ≤ S + Nb, so SOPT - Nb ≤ S and we’re • off by no more than Nb from optimal. Want to show Nb ≤ εS. • Approximation From prev slide: SOPT ≤ SOPT’≤ S’ ≤ S + Nb Proving the (1+ε) The biggest item has value 2Nb/ε just by our definition of b: 
 • b = (ε/(2N)) * maxi vi • • • • The solution we chose must have been at least as good as the value of the single biggest item (which is the same rounded): 
 S’ ≥ 2Nb/ε S ≥ S’ - Nb so S ≥ 2Nb/ε - Nb = Nb(2/ε-1) Rearrange to Nb ≤ S/(2/ε-1) = εS/(2-ε) ≤ εS SOPT ≤ S+Nb ≤ (1+ ε)S. (QED) Summary: Special Cases and Approximation Algorithms Not all is lost when facing NP-complete problems. • • Often, greedy approaches, rounding, or other simple fixes can actually do as well as the best known approximation algorithms. It’s possible your cases aren’t the “hard” cases of • the NP-complete problem. Approximation algorithms can sometimes even • get arbitrarily close to the optimal solutions.