Algorithms and Data
18: NP-Completeness Professor Kevin Gold
NP-Completeness Overview
Some problems are thought to have no polynomial-time
•
algorithms to solve them
•
•
•
Recognizing when you are dealing with such a problem can save wasted effort trying to come up with a polynomial-time optimal algorithm
Instead, your goal should become in these cases developing an approximation algorithm that delivers a good-enough answer in polynomial time
There are a few different categories of hard problem, but a famous one is the NP-complete problems; not the hardest problems around, but seemingly common
A Hard Problem:
Independent Set
INDEPENDENT-SET: Given graph G and input k, does graph
•
G have k vertices such that no two are joined by an edge?
Application: If vertices are people and edges represent
•
“can’t work together,” find k people who can work together
Checking the neighbors of all possible sets of k vertices takes
•
time O(|V|k+1) … exponential in k, one of the inputs
d
b ac
Independent Set is in NP
No known way to solve Independent Set in polynomial time, but
•
a solution can be checked in polynomial time.
The solution is called a certificate, and in this case is just the
•
list of vertices in the independent set.
If a yes-or-no decision problem has a certificate that can be problems called NP.
•
used to prove a “yes” in polynomial time, it is in the class of
•
Being in NP doesn’t necessarily mean a problem is hard! Any decision problem you can solve without a certificate in polynomial time is also in NP. But NP-complete problems, a class of notoriously hard problems, are all in NP.
•
— it is in the class NP.
If P = NP, every decision problem that is polynomial-time to check would also be polynomial time to solve. (Let that sink in a moment)
P = NP?
If a yes-or-no decision problem can be solved in polynomial time, it is in
•
the class of problems called P. (for “polynomial”)
If a yes-or-no decision problem can have certificates (solutions or
•
proofs of “yes”) checked in polynomial time — or if it doesn’t need them
So far, nobody has been able to prove that P ≠ NP — or in other words,
•
that there exist problems that are quick to check but hard to solve
But it seems unlikely that solving would always be as easy as
•
checking!
•
Equivalent to the NP definition we gave earlier: these are problems that could be solved in polynomial time if all possible solutions could be tried simultaneously
NP: Nondeterministic
Polynomial Time
NP doesn’t stand for “not polynomial” but
•
“nondeterministic polynomial.”
“Nondeterministic” here refers to algorithms that can try
•
many possibilities simultaneously
Try all certificates/solutions in parallel — if one
•
checks out, say “yes”
What Makes Us Think
Independent Set is Hard? Independent Set belongs to a category of problems
•
called NP-complete. They share these features:
•
They are in the class NP (polynomial time to check)
They are “NP-hard” — if you could solve one of them in polynomial time, you could solve anything in NP in polynomial time (and P=NP)
Many people have tried to come up with such polynomial-time algorithms and failed; you probably won’t have better luck
•
•
How Do I Know My Problem
is NP-Complete?
Requirement 1: It’s polynomial time to check (in NP). If
•
this isn’t true, it might be even harder than NP-complete.
•
Requirement 2: There is a reduction from an existing NP-complete problem that shows if you could solve your problem in polynomial time, you could solve some other NP-complete problem in polynomial time.
If there’s some way to transform an NP-complete problem into your problem, your problem must be just as hard or harder to solve.
•
•
A polynomial-time reduction shows how a black box that solves problem A in polynomial time could be used to make an algorithm that solves problem B in polynomial time.
•
•
What’s a Reduction?
If an A solver can be used to solve B, “B reduces to A.” This can be written “B ≤P A” (B is no harder than A)
Exponentiation reduces to multiplication: if I want to raise AB, I
•
can achieve that through a polynomial number of multiplications
Deciding whether a graph is bipartite reduces to finding a
•
shortest path: we saw how breadth-first-search can be used to decide whether a graph is bipartite
Clique
Clique: Given graph G and input k, does G contain a
•
set of k vertices with all possible edges between them?
In NP: Given a set of vertices we claim to be a k-clique,
•
we can check the k(k-1)/2 edges in polynomial time.
A 4-clique A graph containing a 4-clique
Reducing Independent Set to Clique
Reduction from Independent Set: If we had a polynomial time clique solver,
•
we could use it to solve an Independent Set problem in polynomial time.
Take the graph G we want to find an independent set in, and create graph
•
G’ that has the complement of the edges (edge {a,b} in G’ iff there was no edge {a,b} in G). This construction takes O(|E| + |V|) time.
There is a clique in G’ iff there was an independent set in G. So return YES
•
for the independent set problem iff our clique solver says YES on G’.
I-SET?
CLIQUE?
YES, CLIQUE YES, I-SET
•
We argued Clique is in NP, then we argued that it could be used to solve another NP-complete problem, Independent Set, in polynomial time.
•
complete.
•
This means that a polynomial time algorithm for Clique is just as unlikely as a polynomial time solution to Independent Set.
We Just Showed
Clique is NP-Complete
These are the requirements to show a problem is NP-
After all, if we had one, we would have a polynomial-
•
time algorithm for Independent Set too.
But We Didn’t Show How to Solve Clique in Polynomial Time!
When showing a problem is NP-complete, we just say “if we could solve this in polynomial time, then we could solve this other hard problem”
In reductions, you imagine you have a black box that solves your problem in polynomial time. You prove hardness by showing that such a black box could be used to solve a problem known to be NP-complete in polynomial time.
By showing some unlikely consequences that would result from your problem being solvable in polynomial time — showing an NP-complete problem would be solvable and therefore P=NP — you show your problem is hard (“NP-hard”).
•
•
•
•
If you show that a solver for a known NP-complete problem could solve your problem, you have proven nothing. We know an NP-complete problem solver would be powerful. It could solve any NP problem.
You must demonstrate the “usefulness” of solving your problem if you want your problem to be let into the NP-complete club.
•
Get That Direction Right
In a reduction, a known hard problem is transformed into the problem you want to prove hard
•
We knew I-Set was hard, we wanted to show Clique was hard,
•
so we showed how a Clique solver could solve I-Set.
•
Vertex Cover is the problem of determining whether a graph has k vertices such that all edges in the graph are adjacent to one of these vertices.
Try a Reduction:
Vertex Cover
I claim there is a reduction from independent set: we could use a
•
vertex cover solver to find an independent set. What should the inputs to our vertex cover solver be, to find an i-set of size k?
The green nodes
are a vertex cover of size 2
•
• • •
•
SAT
Satisfiability, or SAT for short, is the problem of asking whether a boolean formula has any assignment of TRUE or FALSE to its variables that will make it true.
v = OR, ^ = AND, ¬ = NOT
x1 v x2 v x3 ^ ¬x2 ^ ¬x3 => YES (x1=T, x2=F, x3=F) (¬x1 v x2) ^ x1 ^ ¬x2 => NO
This is in NP because given an assignment like
(x1=T, x2=F, x3=F) it’s polynomial time to check that it works.
•
SAT is NP-Complete
in a Special Way
The Cook-Levin Theorem proves that if a problem is in NP, it can be thought of as a giant satisfiability problem. Polynomial-time SAT could solve anything in NP in polynomial time.
Unlike other NP-complete problems, we don’t need to reduce
•
from any other problem to show SAT is NP-complete.
The reasoning for other problems being NP-complete can then
•
be traced back to SAT.
•
“If I could quickly solve Clique, I could solve quickly solve
3-SAT. If I could quickly solve 3-SAT, I could quickly solve SAT. If I could quickly solve SAT, I could quickly solve anything in NP.”
3-SAT is as Hard as SAT
3-SAT: a special case of SAT restricted to boolean
•
formulas that are ANDs of clauses containing three
literals and two ORs
literal: A variable term along with optional negation: ¬x1, x3
•
(¬x1 v x2 v x1) ^ (x1 v x2 v x3) ^…
This turns out to be NP-complete, itself just as hard as
In NP: still polynomial time to check
NP-hard: We can solve arbitrary SAT problems this way
•
SAT
• •
Reduction of SAT to 3-SAT
We can transform any SAT problem into a conjunction of clauses, many of which will not have 3 literals
•
For clauses with 2 variables x1 and x2 , add a
•
dummy variable: (x1 , x2 , y), (x1 , x2 , ¬y)
•
If clause had 1 variable x, do something similar with dummy variables y1 and y2:
(x, ¬y1, ¬y2), (x, y1, ¬y2), (x, ¬y1, y2), (x, y1, y2) … only x can satisfy this set of clauses
Reduction of SAT to 3-SAT
Cont’d
For clauses with more than 3 variables xi, create a chain (x1
•
v x2 v y1) ^ (¬y1 v x3 v y2) ^ (¬y2 v x4 v y3) … (¬yk-3 v xk-1 v xk)
•
Read ¬yi v xi+2 v yi+1 as an implication, yi ⇒ xi+2 v yi+1 “Either x1, x2, or a later x is true; if I said a later x was
Read yi as “at least one of the xj is true in the clauses to
•
follow”
•
true, then either x3 is true or a still later x is true…”
•
This chain is logically equivalent to the original clause.
•
MAX-SAT is the problem of determining whether a boolean formula can have k clauses satisfied by some assignment.
Called MAX-SAT because repeated calls would let us determine what the maximum number of satisfied clauses is.
•
MAX-SAT
Its reduction from SAT is rather easier than the
•
reduction from 3-SAT. What’s the argument?
•
But in general, if an NP-complete problem has some implicit problem to solve — what’s the biggest clique? what’s the satisfying assignment? — finding the solution is no harder than answering whether there is a solution.
Decision Problems and
Optimization Problems
Technically, NP-complete problems are decision problems
•
— a solver just says “yes” or “no”
Our definition of NP — polynomial time to check “yes”
•
answer — doesn’t work otherwise
Can generally show a polynomial number of calls to a
•
“yes” or “no” black box would give you a solution.
Reducing 3-SAT to Clique
There’s often more than one way
•
to show a problem is NP-complete
To solve a 3-SAT problem with a
•
Clique solver:
•
•
Create groups of nodes corresponding to clauses of the 3-SAT formula
Connect nodes iff they aren’t in the same group and aren’t negating each other
(x1 and ¬x1)
Reducing 3-SAT to Clique
This graph has a k-clique iff the original k-clause formula is satisfiable.
If satisfiable, clique:
Pick one satisfied node from each group and all edges connecting them
No reason edges wouldn’t exist — not contradictory, all from different groups — so, clique
•
•
•
•
Reducing 3-SAT to Clique
This graph has a k-clique iff the original k-clause formula is satisfiable.
If k-clique, satisfiable:
Can give truth assignment matching the clique without contradiction because it has no contradictory pairs
•
•
Each node must be in a different
•
group, or edges would be missing
•
This is a truth assignment
•
satisfying each clause – QED.
•
3-SAT is a Pretty Useful Problem to Reduce From
Recall that to prove NP-completeness, you must show solving your problem in polynomial time would lead to a polynomial time solution to an NP-complete problem.
If your problem is NP-complete, there’s a good chance
•
a solver for it can directly solve a logic problem.
•
There’s never any reason to reduce from SAT instead of 3-SAT, because reduction from 3-SAT requires strictly less work (fewer problem types to solve).
• • •
Hamiltonian Cycle is
NP-Complete
A Hamiltonian cycle is a cycle that visits every graph vertex once.
Strangely enough, we can solve 3-SAT problems with a Hamiltonian cycle solver.
We start with a graph that contains repetitions of the following gadget, one for each variable
x1 …x1
“true” “false”
Hamiltonian Cycle Solving
3-SAT Construction Cont’d
(x1 v … )
Make a looped chain of the variable
•
gadgets
x1 x2
Add a new vertex off to the side for
•
each formula clause
Add a right-going “detour” to a
•
clause if a variable is unnegated in the clause
Add a left-going “detour” to a clause
•
if a variable is negated in the clause
T
3k + 1 nodes across middle for k
•
clauses — skip nodes between them
Right-going detour to a clause vertex
A valid assignment implies a Hamiltonian Cycle
Without the clause nodes, a cycle exists by choosing the “true” (right through middle) or “false” (left through middle) paths through each variable
•
A path through a variable “set to true” can detour to a vertex
•
for a clause iff the clause contains the unnegated variable
A path through a variable “set to false” can detour to a vertex
•
for a clause iff the clause contains the negated variable
•
So if there is a satisfying assignment for the 3-SAT problem, there is a Hamiltonian cycle — for each clause, follow a detour from a satisfying variable
A Hamiltonian Cycle Implies
a Valid Assignment
Need to also show Hamiltonian Cycle implies there is a valid assignment
•
(so Hamiltonian cycle exists iff 3-SAT satisfiable)
Can create variable assignments by looking at whether we went left to
•
right or right to left for each variable … as long as cycle has normal form.
•
•
The only obvious way to “break” the normal form would be by jumping back from a clause vertex to a different variable … but then the next node in the middle chain would be unreachable (it’s not clause-connected)
So all Hamiltonian cycles are valid assignments — QED.
Hit already
Impossible path
No way to hit
Only “escape”
•
•
(This problem isn’t well-defined enough to be very formal, but it’s a fun proof)
Claim: If we had an AI that could decide whether a Super Mario Bros. level has a way to get to the exit in polynomial time, we could solve 3-SAT in polynomial time. Thus this problem is NP-hard.
Reducing 3-SAT to
“Mario Solvability”
Reduction: Given a 3-SAT problem to solve, make a Super Mario Bros. level
•
with the following characteristics.
x1 TF
•
For each variable, have an
irreversible this-way or that-way
choice of which way to go, corresponding to setting a variable to T or F.
Reducing 3-SAT to
“Mario Solvability”
x1 TF
¬x1 x2 x3
Have a room like the above for each clause.
•
Each turtle’s room corresponds to a literal.
Make the room reachable only if the player
•
picked the right setting of that literal to make the
•
clause true.
If any turtle is reached, the passage at the bottom can be opened up with the turtle shell.
Reducing 3-SAT to
“Mario Solvability”
The final corridor to the end is only traversable if
•
some turtle was reached in each clause room.
This is only possible if the player chose •
•
passages corresponding to a satisfying assignment of the 3-SAT clause.
So a polynomial-time Mario solver could solve
3-SAT problems in polynomial time as well; so solving Mario levels like these is NP-hard.
•
Notice the Direction
We’re not claiming or showing 3-SAT could solve arbitrary Mario levels. The levels were very specifically constructed to solve 3-SAT problems.
By showing we can solve arbitrary 3-SAT problems with a Mario solver, we’ve shown deciding whether Mario levels are solvable is NP-Hard.
What’s missing from this proof to show “Mario solvability” is NP-complete? Does it matter what other stuff might be in these Mario levels?
•
•
You Can Argue Lots of Games are NP-Hard in This Way
(need to visit at least one of the 3 passages below to shoot the guys above so you can roll through the passage up top)
Aloupis et al 2012: http://arxiv.org/pdf/1203.1895v1.pdf
Proving Independent Set is
NP-Complete
Suppose we take Clique to be a problem we know for sure is NP-complete, since we reduced from 3-SAT. How could we use Clique to prove Independent Set is NP-Complete?
We never proved Independent Set was NP-complete — we
•
just asserted it.
•
What would a reduction from 3-SAT for Independent Set
•
look like? (Think about the reduction of 3-SAT to Clique)
The Chain of Logic Connecting NP-Complete Problems
Every chain of polynomial-time reductions still takes
•
polynomial time.
So we can assert that every NP-complete problem would, if
•
solved, allow us to solve every NP problem in polynomial time.
All problems in NP
SAT
If you have a polynomial time
Clique solver, just turn any NP problem into a SAT problem, and the SAT problem into a 3-SAT problem, and the 3-SAT problem into a Clique problem — that’s all still poly time
“Mario Solvability”
3-SAT
: reduces to
Independent Set Clique
Hamiltonian Cycle
•
More Kinds of NP-Complete Problems
So far we’ve seen some NP-complete problems about graphs (Clique, Independent Set) and boolean formulas (SAT, 3-SAT)
But the easy-to-check, hard-to-solve structure can numerical problems as well
•
also be found in problems about sets and
We’re going to go down a path of reductions that
•
shows how these can be NP-complete, too
Exact Cover
(Also Called “Set Packing”) Exact Cover: Given a set U and sets S1…Sn that are all subsets of
•
U, is there a collection of disjoint sets that has as their union U?
•
“Set Cover” removes disjoint restriction but wants exactly k sets
Example: U = {a, b, c, d, e, f}, S1 = {a, b, c},
Fitting puzzle pieces into a grid so that everything is covered is a
•
S2 = {c, d, e}, S3 = {d, e, f}; solution {S1, S3}
•
special case of EXACT-COVER
Represent by
{long-piece1, 11, 12, 13, 14, 15}
Represent by
{elbow1,
17, 18, 28, 38, 48}
•
Exact Cover Reduction
We can use an Exact Cover solver to solve 3-SAT.
Let the items in the big set U be one element xi for each variable in the
3-SAT formula, one cj for each clause, and one pjk for each position in each clause
The big idea: Construct sets so that they “fit together” to form U only if
•
some truth assignment makes the 3-SAT formula satisfiable.
•
Create clauses {cj, pjk} for every position in every clause — adding this to
•
the cover will represent the literal at position pjk satisfying cj
For each variable, include XiT = {xi} U {pjk: xi is negated at position jk} and And include {pjk} for all positions — these are easy to cover
•
XiF = {xi} U {pjk: xi is not negated at position jk}
•
Exact Cover Reduction Exact cover implies formula satisfiable:
For each i, exactly one of XiT and XiF was chosen (because they each have xi) — this is a truth assignment.
Each set containing a Ci must have one pjk associated with it – this pjk is consistent with the truth assignment (the Xi sets would have “taken it away” if not, since they include the positions they contradict)
So the cover shows there is a satisfying formula.
•
•
•
•
•
Exact Cover Reduction Formula satisfiable implies exact cover exists:
•
•
Cover the set with the truth assignment: XiT if xi true, XiF if xi
•
false — this covers the xi
It’s possible to choose {cj, pjk} for each clause at one non-
•
contradicted place per clause if clause satisfiable
Cover the remaining pjk with our spare {pjk} sets. Polynomial time to construct these sets, so NP-Hard.
And exact cover is checkable in polynomial time, so NP-
•
complete.
Subset Sum & Knapsack
Recall that a special case of Knapsack is that the values are equal to the weights — then we just want to pick weights that hit the weight limit exactly
E.g., {4, 3, 4, 2, 1}, target 6 => {4, 2} YES
Recall the Knapsack problem: trying to stuff the most valuable things
•
under a weight limit
•
This is the “Subset-Sum” problem: given a set of items with values,
•
determine whether a subset sums to a target value
•
Note that if Subset-Sum is NP-complete, then Knapsack must be NP-Hard,
•
because a Knapsack solver could solve Subset Sum as a special case
Subset-Sum Reduction
Reducing from Exact Cover. We’ll try a slightly faulty
•
approach, then correct it.
Think of the sets of elements to cover as binary
•
representations of numbers: {a0 a1 a3} = 1101
•
•
The target set to cover can be thought of as a binary string of all 1’s, with length equal to the number of elements:
{a0 a1 a2 a3} = 1111
1101 + 0010 = 1111 (for example), so sometimes, finding numbers that add is the same thing as finding sets that union uniquely. But when does this not work?
•
Subset-Sum Reduction
So guarantee we never have to carry: think of the Exact Cover sets as numbers in base s+1, where s is the number of sets.
Example: {a0 a1 a3} with 9 sets = “1101” (base 10)
The approach described works only if carrying isn’t possible —
•
otherwise, the result may not be an exact cover.
•
Solve this problem using Subset Sum, trying to equal 1111…1
•
in base s+1.
No carrying possible, so the only way the sum works out is if Polynomial-time reduction and in NP, so NP-complete.
•
each digit gets a 1 once in the summation.
•
A Pseudo-Polynomial Algorithm for Subset-Sum
A pseudo-polynomial time dynamic programming algorithm for Subset Sum:
Initialize 2D array achievable[0,j] = 0
For increasing maximum item i (up to N)
For t = 0 … T // T is the final target
achievable[i, t] = achievable [i-1,t-wi]
Return achievable[N,T]
This runs in time O(NT).
But notice that if we transformed Exact Cover with N items and s sets into a Subset Sum problem, the whole thing would run in time Ω(N(s+1)N). This doesn’t solve NP-complete problems in polynomial time.
• •
Real polynomial algorithms run in time polynomial in the input size (in bits) –
•
in this case we should be polynomial in log T, not T.
A Slightly Bigger Chain of NP-Complete Problems
All problems in NP
SAT
Hamiltonian Cycle
“Mario Solvability”
3-SAT
Subset Sum
: reduces to
Independent Set Clique
Exact Cover
Summary
NP means poly-time to check a “yes” proof
NP-complete problems are famously hard decision problems, although
•
they are all at least polynomial-time to check if the answer is “yes”
•
NP-hard means a poly-time solution could be used to solve another
•
NP-complete problem in poly-time
Reducing problem A to problem B means
So we reduce from known NP-complete problems to prove new things are
•
(1) A problems can be transformed into B problems
(2) Solving B is at least as hard as solving A
•
NP-complete, and therefore likely intractable.
•
Also need to show it’s in NP — otherwise, it might be even harder