Algorithms and Data 19: Working Around
NP-Completeness Professor Kevin Gold
Two Approaches to Dealing With NP-Complete Problems
1. Special cases.
Maybe you never really see the toughest cases
and your worst case need not be exponential.
2. Approximation algorithms.
If a close solution is good enough — as in, within a constant factor of optimal — sometimes you can do that in polynomial time.
•
2-SAT is in P
With just 2 variables in each clause, can think of each clause two
•
implications
(x v y) becomes ¬x ⇒ y and ¬y ⇒ x
Can create a graph with nodes corresponding to variables and
•
their negations, and edges corresponding to the implications.
The formula is unsatisfiable iff there is both a path from v to ¬v and
•
a path from ¬v to v for some variable v
“If x, then not x. If not x, then x.” Contradiction. •
•
So a linear-time search per variable solves this problem.
2-SAT Graph Example (x v y) ^ (¬x v ¬y) ^ (¬x v y) ^ (x v ¬y)
Notice paths from x to ¬x and back and y to ¬y and back…
so each variable setting implies a contradiction, and this formula is not satisfiable.
x
y
¬x
¬y
Independent Set on Trees
Dynamic programming can find independent sets on trees in polynomial time. Begin at leaves of the tree, which get a score of 1.
Score of a parent is max {sum of children, 1 + sum of grandchildren}
Can work backwards at end to determine what the set is.
8
3
2
2
• • • •
5
22 11111
1
Why Is Independent Set on
Trees Easier?
What allows the dynamic programming approach to work here is an
•
ability to “commit” to the clearly best solution for a node and move on.
As soon as we add any cycles to the graph, it’s no longer clear when
•
we can commit to an optimal small solution.
5?
8?
3?
2?
2?
2?
1?111? 1?
2?
1?
Approximation Algorithms
Approximation algorithms attempt to produce
•
solutions within some constant factor of optimal.
They are inherently solving the “optimization
•
problems” associated with NP-complete problems.
Independent Set decision problem: Is there an
•
independent set of size k in this graph?
Independent Set optimization problem: Find the
•
largest independent set on the graph.
Minimum Vertex Cover
Approximation Algorithm
The 2-approximation algorithm: while there are edges not adjacent to marked vertices, pick such an edge and put both vertices in the cover
Vertex Cover optimization problem: Find the smallest
•
set of vertices that touches every graph edge.
•
•
Minimum Vertex Cover
Approximation Algorithm
None of the edges chosen share a vertex, so the optimal solution must contain at least as many vertices as there were edges chosen
This is surprisingly close to the best known approximation (and was best for some time)
We picked twice as many vertices
•
as that, so we are <= 2OPT.
•
Sometimes straightforward
•
strategies yield respectable bounds
Approximation Algorithms and Reductions Don’t Mix
The independent set is all the nodes not in the cover.
Recall that we know finding a vertex cover of size k lets us find an
•
independent set of size n-k.
•
So, can we use a 2-approximation algorithm for vertex cover as a The reduction “works” but isn’t necessarily close to optimal.
•
good approximation algorithm for independent set?
•
Suppose the optimal vertex cover and the optimal independent
•
set are both of size |V|/2.
Returning 2OPT nodes returns all the vertices, and an empty
•
independent set. That’s not within any good factor of optimal.
Traveling Salesman Problem
Traveling Saleman Decision Problem (NP-complete): Given a weighted graph, is there a simple cycle that visits all nodes and has weight <= D?
Exercise: give a reduction from Hamiltonian Cycle
•
•
Traveling Salesman Optimization Problem: find the
•
shortest simple cycle that visits all vertices.
•
This is actually solvable with a polynomial number of calls to the decision version, as with other
NP-complete problems.
Euclidean Traveling Salesman Problem
(Euclidean distance is the distance metric you’re used to, sqrt(Δx2 + Δy2)
These points therefore obey the triangle inequality: going from a to b to c is never On graphs like this, we can get within a factor of 2 of the optimal in polynomial time.
The Euclidean Traveling Salesman Problem looks for short tours on graphs that
•
represent Euclidean distances between points in the plane
•
•
faster than going directly a to c.
•
3
1
2
triangle inequality holds
3
2
6
triangle inequality doesn’t hold
Euclidean Traveling
Salesman Approximation
1. Calculate the minimum spanning tree for the graph, using Kruskal’s or Prim’s algorithms.
2. Create a pseudo-cycle that follows every edge twice around the minimum spanning tree.
3. “Relax” the tour by skipping vertices already touched and heading to the next vertex.
http://www.cs.yale.edu/homes/aspnes/pinewiki/ApproximationAlgorithms.html
Euclidean Traveling Salesman Approximation
The optimal tour must be larger than the optimal
•
spanning tree, because it contains a spanning tree.
•
Because of the triangle inequality (shortcuts are always smaller than two-step paths), our tour is no larger than twice the minimal spanning tree size.
So our tour is 2-optimal.
•
•
Given a set of jobs with running times t1, t2, ...tn and k machines that can run one job at a time, is there a way to schedule all jobs to finish before deadline D?
Machine Scheduling
Even the two-machine case is NP-complete by
elements summing to target T
T
T
force it to balance
Subset Sum Reduction
•
reduction from Subset-Sum.
Let S be the sum of all the elements in the subset
•
sum and T be the target to be summed to.
Create jobs with times equal to the set of numbers
•
to be summed, plus one additional job that takes time 2T-S
leftovers
This set of jobs can be finished before deadline T iff
•
there is a subset of the original elements summing to T.
S-T
2T-S
•
An Optimization Version and a Greedy Algorithm
A greedy algorithm:
Initialize machine finishing times Fj to 0
For i = 1 to N
Assign job i to the machine Mj with the minimum
finishing time
Update finishing time of Mj: Fj += ti
An optimization version of machine scheduling is to
•
achieve the earliest possible time by which all N jobs finish.
M1 M2
6 4 354
Greedily Scheduling 6, 3, 5, 4, 4 (completion time 12)
Thinking About How Close This Could Be To Optimal
111111 6
Optimal
111 6 111
Greedy (6 last)
The greedy algorithm behaves reasonably until it gets a big element that it
•
should have “planned for”
If the algorithm continues, it’ll reduce this imbalance.
And if the big element is too big, even the optimal can’t do anything about it.
• •
•
Lower bound #1: The optimal finishing time is at least S/k, where S is the sum of all job times and k is the number of machines.
Two Lower Bounds
To figure out how close to optimal the approximation
•
algorithm is, we need some lower bounds on performance.
There must be some machine that does at least this
•
much work, or all the work wouldn’t get done.
Lower bound #2: The optimal finishing time is at least as
•
big as the biggest job time.
•
Proving the Greedy
Assignment is 2-Optimal
It was still the least loaded before we added the final job to it (because of our greedy strategy) — meaning if the last job took time tj, the other machines were all already loaded to at least
Ti-tj.
Consider the machine Mi that works the longest in our greedy
•
algorithm’s schedule, with load Ti.
So the sum S of all loads across all k machines was
•
at least k(Ti - tj). So (Ti - tj) <= S/k.
• •
That was our lower bound #1 on OPT. (Ti - tj) <= S/k <= OPT. And tj <= OPT by lower bound #2. So Ti <= 2OPT. (QED)
Thinking About the Worst Case Can Lead to an Improvement
If the bad stuff happens when a big job comes last,
•
why not balance the big jobs first?
•
Approximation Algorithm #2: Same as the first, but we first sort the jobs from largest to smallest time, scheduling large jobs first.
Still greedily schedule to the least-used machine.
Now, when we think about the time taken by the final job, that will turn out to be smaller than OPT — we can show it’s no bigger than 1/2 OPT.
•
•
Sorted Greedy Machine Scheduling Is Within 3/2 of OPT
If we have fewer jobs than number of machines k, we schedule one
•
job per machine and we’re optimal.
So assume more than k jobs for the rest of this
Optimal deadline is at least 2tk+1 where tk+1 is the (k+1)th biggest
•
•
job.
Some machine must be working two of the first k+1 jobs, and
•
those jobs are all at least as big as tk+1
So the proof of 3/2-optimality goes like the 2-optimality argument,
•
only the final element tj <= tk+1 <= 1/2 OPT. (it’s sorted so tj <= tk+1)
•
So (Ti - tj) <= S/k <= OPT here implies Ti <= 3/2 OPT. (QED)
•
Set Cover
E.g. U={1, 2, 3, 4}, S1 = {1, 2}, S2 = {2, 3}, S3 = {3, 4}:
NP-completeness can be shown reducing from Vertex Cover.
Set Cover is the NP-complete problem: given a set U and subsets of U S1
•
... Sn, are there k of them whose union forms a target set U?
•
YES for k=2, NO for k=1
If we had a Set Cover solver, let U be a set with one element eij for each
•
edge in the graph, and let set Si contain the edges eij that are adjacent to vertex i.
If there is a set cover, the vertices corresponding to the chosen sets are Also in NP because clearly we can check the union in poly time
•
a vertex cover.
•
•
A greedy algorithm is the following:
R = U
while |R| > 0 // uncovered elements remain
add set Si that minimizes the
cost-per-uncovered-element wi/|Si ∩ R|
R = R – Si
Return the selected sets
Weighted Set Cover
Suppose there is a cost wi associated with each set, and we want to minimize
•
the total cost of covering the set.
Metaphor/application: You’re buying compilations of games that are
•
awkwardly bundled, and want to spend the minimum to purchase them all.
The argument for being close to optimal will be that element-by-element, we
•
“paid a good price” compared to optimal
•
Claim: Greedy weighted set cover spends at most log(M) times the optimal, where M is the maximum set size.
This proof will basically work by showing for every set with cost wi, our algorithm paid no more than log(M)wi to cover all the elements in that set
•
A Bound on Greedy Weighted Set Cover
So whatever sets OPT picked, our algorithm paid
•
within a factor of log M of those costs wi
A Bound on Greedy Weighted Set Cover
Let the cost paid for element s, cs, be the weight wi of the set that covered it,
•
divided by the number of uncovered elements at the time.
E.g., if wi = 4, Si = {a, b, c}, and b already covered, then cs is 2 for a and c. The total amount spent by the algorithm is the sum of these per-element costs
•
•
cs — summed over each element in the final set, they are equal to the Σwi paid.
For any given set Sk, order the elements 1…d in the order they were actually
•
added to our cover.
The cost actually paid for element j is no more than wk/(d-j+1): the weight of
•
Sk divided among the remaining elements in the set.
The cost may have been less than this if we actually “bought” the element
•
with a different set, but it’s no more than this.
•
The cost actually paid for element j of set Sk is no more than wk/(d-j+1): the weight of Sk divided among the remaining elements in the set. (d = |Sk|)
A Bound on Greedy
Weighted Set Cover
So the total cost of Sk is ≤ Σj wk/(d-j+1)
The sum in parens is the harmonic sum 1/i up to d, which
•
= wk (1/d + 1/(d-1) + … + 1)
•
is Θ (log d) — so the cost is Θ (wk log |Sk|)
•
So for each set Si* in the optimal cover, we paid no more than log(|Si|)wi for it, or less than Σ log(|Si|)wi ≤ (log M) OPT overall
•
•
Some Hard Problems are
Approximable, Others Not
Though NP-complete problems are “equally hard” from the point of view of getting exact answers, they differ greatly in how approximable they are
For some, any approximation would itself be NP-hard
Sometimes it’s possible to get arbitrarily close to optimal in
•
polynomial time on NP-complete problems!
For others, we can get within a factor of 1+ε of optimal for
•
arbitrary ε
Problems with pseudo-polynomial algorithms are often easily
•
approximated, because we can employ rounding
Recall Our Original
Knapsack Algorithm
Suppose weight limit W = 4 Items:
i=1: Sandwich, wi = 1, vi = 5 i=2: TV, wi = 4, vi = 10
i=3: Water bottle, wi = 2, vi = 7
Items i
0
0
0
0
5
5
0
5
5
0
5
5
Weight Limit w
0123
00 15 27 3 12 4 0 5 10 12
A Slightly Different Knapsack Algorithm
If we can round the numbers involved in Knapsack, we can make much
•
smaller tables that only have entries for the possible rounded sums.
E.g., if numbers are 238947, 106917, 369400 … round to 300000,
•
200000, 400000.
Instead of millions of table entries, table entries for just 100000,
•
200000, 300000, …
•
Or we could just record these as 1, 2, 3 …
But we probably shouldn’t round weights, since that could lead to
•
impossible solutions.
So we’ll round values and redesign the algorithm to make a table of
•
values.
•
Before we start, we can round the values to ceiling(vi/b) for some b.
A Slightly Different Knapsack Algorithm
Key idea of the dynamic programming recurrence: bestWeight[i,v]
•
will include the smallest weight that achieves value v using only the first i elements.
Either the current item isn’t in the optimal solution, and
•
bestWeight[i,v] = bestWeight[i-1,v]…
• • •
…or it is, and bestWeight[i,v] = bestWeight[i-1,v-vi] + wi Take the min of these for each table entry.
In the end, find the biggest V with weight that doesn’t go over the limit.
A Slightly Different
Knapsack Algorithm
Suppose weight limit W = 4, rounding factor b= 4 Items:
i=1: Sandwich, wi = 1, vi = 5/4 ≈ 2
i=2: TV, wi = 4, vi = 10/4 ≈ 3
i=3: Water bottle, wi = 2, vi = 7/4 ≈ 2
Items i
0123
00 11 21
Value 33
0
–
–
–
v/b
4 55 67 77
–
–
0
1
1
–
–
–
—
0
1
1
4
5
5
3
Largest value under weight limit
—
A Slightly Different Knapsack Algorithm
For i = 0 to N
w[i,0] = 0
For i
= 1 to N
for v = 1 to (sum achievable by vi so far)
if v > (sum of v up to vi-1)
w[i, v] = wi + w[i-1,v]
else
w[i,v] = min(w[i-1,v], wi + w[i-1, max(0, v – vi))
Return maximum v such that w[i,v] <= weight limit W
•
•
• •
A Polynomial Time
(1+ε) Approximation
To get a Knapsack solution within a factor of (1+ε) of optimal, in
•
polynomial time:
Let b = (ε/(2N)) * maxi vi, which sets the maximum item value to 2N/ε. This will make our running time no longer dependent on values vi.
Use the preceding Knapsack algorithm with values vi/b
Return the set of items maximizing the value.
The running time is at most proportional to the entries in the table,
This is polynomial if we consider ε to be a constant.
•
(N items * sum of all values) ≤ N * N * (2N/ε)= ε2N3.
•
•
•
Let SOPT be the optimal sum of values, let S’ be the sum of our “rounded” values (divide by b, ceiling, multiply by b), and let S be our real (not-rounded) sum.
S’ must be at least as good as what rounding applied to the values in SOPT would have produced (call this SOPT’); we know the algorithm chooses the best “rounded” result.
Proving the (1+ε)
Approximation
And S’ is no larger than b added to each real value, or
•
S + Nb.
Thus SOPT ≤ SOPT’ ≤ S’ ≤ S + Nb, so SOPT - Nb ≤ S and we’re
•
off by no more than Nb from optimal. Want to show Nb ≤ εS.
•
Approximation From prev slide: SOPT ≤ SOPT’≤ S’ ≤ S + Nb
Proving the (1+ε)
The biggest item has value 2Nb/ε just by our definition of b:
•
b = (ε/(2N)) * maxi vi
•
• • •
The solution we chose must have been at least as good as the value of the single biggest item (which is the same rounded):
S’ ≥ 2Nb/ε
S ≥ S’ - Nb so S ≥ 2Nb/ε - Nb = Nb(2/ε-1) Rearrange to Nb ≤ S/(2/ε-1) = εS/(2-ε) ≤ εS SOPT ≤ S+Nb ≤ (1+ ε)S. (QED)
Summary: Special Cases and Approximation Algorithms
Not all is lost when facing NP-complete problems.
•
•
Often, greedy approaches, rounding, or other simple fixes can actually do as well as the best known approximation algorithms.
It’s possible your cases aren’t the “hard” cases of
•
the NP-complete problem.
Approximation algorithms can sometimes even
•
get arbitrarily close to the optimal solutions.