代写代考 Analysis of Algorithms, I

Analysis of Algorithms, I
CSOR W4231

Computer Science Department

Columbia University
Representative NP-complete problems: TSP, Set Cover

1 Review of last lecture
2 Representative N P -complete problems
3 Integer Programming
4 Minimum-weight Set Cover
An integer programming formulation of Set Cover The linear program relaxation
5 An approximation algorithm for Set Cover Rounding the LP solution
An f-approximation algorithm for Set Cover

Complexity classes P, NP and NP-complete
Definition 1.
We define P to be the set of problems that can be solved by polynomial-time algorithms.
Definition 2.
We define NP to be the set of decision problems that have an efficient certifier.
Definition 4.
A problem X(D) is NP-complete if 1. X(D)∈NP and
2. forallY ∈NP,Y ≤P X.

How do we show that a problem is NP-complete?
Suppose we had an NP-complete problem X.
To show that another problem Y is NP-complete, we use
transitivity of reductions. So we “only” need show that 1. Y ∈NP
The first NP-complete problem Theorem 5 (Cook-Levin). Circuit SAT is NP-complete.

Satisfiability of boolean functions
SAT: Given a formula φ in CNF with n variables and m clauses, is φ satisfiable?
3SAT: Given a formula φ in CNF with n variables and m clauses such that each clause has exactly 3 literals, is φ satisfiable?
Circuit-SAT: Given a boolean combinatorial circuit C, is there an assignment of truth values to its inputs that causes the output to evaluate to 1?
Circuit-SAT ≤P SAT, SAT ≤P 3SAT and 3SAT ≤P IS(D)

Common pitfalls when showing NP-completeness
1. Carry out the reduction in the wrong direction
2. Reduce from a problem not known to be NP-complete
3. Exponential-time transformations
􏰉 Subsets, permutations
4. Neglect to carefully prove both directions of equivalence of the original and the derived instances; that is, x is a yes instance of X if and only if y = R(x) is a yes instance of Y
5. Neglect to show that the problem is in N P
Suggestions
􏰉 You should think carefully which problem is most suitable to reduce from
􏰉 In absence of other ideas, reduce from 3SAT

The Traveling Salesman Problem (TSP)
Tour: a simple cycle that visits every vertex exactly once. Definition 7 (TSP(D)).
Given n cities {1, . . . , n}, a set of non-negative distances dij between every pair of cities and a budget B, is there a tour of length ≤ B?
Equivalently, is there a permutation π such that
1. π(1)=π(n+1)=1; thatis,westartandendatcity1 2. the total distance travelled satisfies
􏰆 dπ(i)π(i+1) ≤ B
Application: Google street view car

Example instance of TSP
Depending on the distances, TSP instances may be
􏰉 Asymmetric: dij ̸= dji
􏰉 Symmetric: dij = dji
􏰉 Metric: satisfy the triangle inequality dij ≤ dik + dkj
􏰉 Euclidean: e.g., cities are in R2 hence city i corresponds to point (xi,yi);thendij =􏰤(xi−xj)2+(yi−yj)2

A related problem and hardness of TSP(D)
Hamiltonian Cycle: Given a graph G = (V, E), is there a simple cycle that visits every vertex exactly once?
Hamiltonian Cycle is NP-complete.
Proof: Reduction from 3SAT (e.g., see your textbook).
Symmetric TSP(D) is NP-complete.
Proof: reduction from undirected Hamiltonian Cycle.

Proof of Claim 2 (Hamiltonian Cycle ≤P TSP(D))
1. Start from an arbitrary instance of undirected Hamiltonian Cycle, that is, an undirected graph G = (V, E).
2. Construct the following instance (G′ = (V ′, E′, w), B) of TSP(D): G′ is a complete weighted graph with V ′ = V such that for every edge e ∈ E′,
􏰂1, ife∈E we = 2, otherwise
3. Set the budget B = n.
This completes the reduction transformation.
Equivalence of the instances is straightforward:
􏰉 If G has a hamiltonian cycle, that cycle is a tour of length n in
􏰉 If G′ has a tour of length n, it must consist of edges of weight 1 (why?); thus all these edges appear in G.

Concluding remarks on TSP
􏰉 Claim 1 also holds for directed Hamiltonian cycle. An exact analog of the proof of Claim 2 then shows that asymmetric TSP is NP-complete.
􏰉 It is possible to reduce Hamiltonian cycle to Euclidean TSP, thus showing that even Euclidean TSP is
N P -complete.
􏰉 However, these problems are not similar in terms of how well they can be approximated: it is possible to provide very good approximate solutions to Euclidean TSP, which is not the case for Symmetric TSP.

Packing and partitioning problems
􏰉 Set Packing: given a set U of a elements, a collection S1,S2,…,Sb of subsets of U, and a number k, is there a collection of at least k subsets such that no two of them intersect?
􏰉 3D-Matching:GivendisjointsetsB,G,H,eachofsizen, andasetoftriplesT ⊆B×G×H,isthereasetofn triples in T , no two of which have an element in common?
Reduction from 3SAT.

Numerical problems
􏰉 Subset sum: Given natural numbers w1, . . . , wn and a (large) target weight W , is there a subset of w1, . . . , wn that adds up exactly to W?
Applications: cryptography, scheduling
􏰉 Minimum-weight solution to linear equations: Given a system of linear equations in n variables with integer constants, and an integer B ≤ n, does it have a rational solution with at most B non-zero entries?
Applications: coding theory, signal processing

Similar problems with very different complexities
N P -complete
longest path
shortest path
3D matching
Hamiltonian cycle
Euler cycle
3-colorability
2-colorability
LCS of n sequences
LCS of 2 sequences
More on NP-completeness:
􏰉 Computers and Intractability: A guide to the theory of
NP-completeness, by Garey and Johnson
􏰉 Computational Complexity, by C. Papadimitriou

Integer Programming
Integer programming (IP(D)): Given a system of linear inequalities in n variables and m constraints with integer coefficients and a integer target value k, does it have an integer solution of value k?
􏰉 Applications: production planning, scheduling trains, etc.
max cTx subject to Ax ≤ b
Here A is an m × n matrix, b ∈ Rm, c ∈ Rn, x is an integer
vector with n components.
What does the set of feasible solutions look like?

Rounding the LP is often insufficient
max 1.00×1 + 0.64×2 x1 ≥0,×2 ≥0
subject to 50×1 + 31×2 ≤ 250 3×1 − 2×2 ≥ −4
x1, x2 integer
5o o oo o o o 4oooooo 3oooooo 2oooooo 1oooooo
From Integer Programming by L. Wolsey
The optimal linear programming solution is far from the optimal integer solution $(5,0)$.
Optimum LP solution (376/193, 950/193)
Optimum IP solution (5,0)

Is IP(D) hard?
􏰉 IP(D) is in NP.
􏰉 We can quickly solve LPs with several thousands of variables and constraints but there exist integer programs with 10 variables and 10 constraints that are very hard to solve.
􏰉 This is not too surprising: integer programs restricted to solutions x ∈ {0, 1}n model yes/no decisions, which are generally hard.
􏰉 To formalize this intuition, we will reduce an NP-complete problem to IP(D).

Integer Programs for Vertex Cover and IS
First we formulate integer programs for two NP-hard problems. IP for Independent Set:
subjectto xi+xj ≤1, forevery(i,j)∈E
xi ∈ {0,1}, for every i ∈ V
IP for Vertex Cover:
subjectto xi+xj ≥1, forevery(i,j)∈E
xi ∈ {0,1}, for every i ∈ V

IP(D) is NP-complete
VC(D) ≤P IP(D) Proof.
Reduction from arbitrary instance (G = (V, E), k) of VC(D) to the following integer program with target value k:
subjectto xi+xj ≥1, forevery(i,j)∈E n
􏰆 xi ≤ k i=1
xi ∈ {0,1}, for every i ∈ V Equivalence of the instances is straightforward.

Similar problems with very different complexities (new)
N P -complete
longest path
shortest path
3D matching
Hamiltonian cycle
Euler cycle
3-colorability
2-colorability
LCS of n sequences
LCS of 2 sequences
integer programming
linear programming
The theory of integer and linear programming and duality can guide the design of approximation algorithms,
and exact solutions, for hard problems.

Minimum-weight Set Cover
􏰉 a set E = {e1,e2,…,en} of n elements
􏰉 acollectionofsubsetsoftheseelementsS1,S2,…,Sm,
where each Sj ⊆ E
􏰉 a non-negative weight wj for every subset Sj
A minimum-weight collection of subsets that cover all of E. In symbols: find an I ⊆ {1,…,m} such that ∪i∈ISi = E and
􏰅 wi is minimum. i∈I
(Unweighted Set Cover: wj = 1 for all j)

Example instance of Set Cover
n = 8 ground elements, m = 6 subsets with weights w1 =w2 =w3 =w4 =1,w5 =w6 =1+ε.

Motivation: detect computer viruses
Motivation (IBM AntiVirus): detect features of boot sector viruses that do not occur in typical applications; then use them to discover more viruses
􏰉 Ground elements: known boot sector viruses (n ≈ 150)
􏰉 Sets: labelled by some three-byte sequence occurring in these viruses but not occurring in typical computer applications (m ≈ 21000); each set consisted of all the viruses that contained the three-byte sequence
􏰉 Output: a small number of such sequences—much smaller than 150—that cover all known viruses
=⇒ use the small set cover as features in a neural classifier to determine presence of a boot sector virus
=⇒ detect new viruses (many boot sector viruses are written by modifying existing ones)

Reduction via generalization
Set-Cover(D) is NP-complete.
Reduction from VC(D). Input instance: (G = (V, E), k).
􏰉 Set E = {e1,…,em} to be the set of ground elements we
want to cover.
􏰉 For every vertex j, set Sj to be the set of edges (ground
elements) that are incident to–hence covered by–vertex j.
􏰉 Setwj =1forall1≤j≤n.
Equivalence of instances: input graph has a vertex cover of size k if and only if E has a set cover of weight k.

Forming the integer program for Set Cover
Variables: we introduce one variable per set Sj; intuitively, 􏰂 1, if Sj is included in the solution
xj = 0, otherwise Constraints: ensure that every element is covered:
for every element ei, at least one of the sets Sj containing ei appears in the final solution
Objective function: minimize the sum of the weights of the sets included in the solution

An integer programming formulation of Set Cover
Integer program for Set Cover:
min 􏰆 wj xj
subjectto 􏰆 xj ≥1, forevery1≤i≤n
xj ∈{0,1}, forevery1≤j≤m

An integer programming formulation of Set Cover
Integer program for Set Cover:
min 􏰆 wj xj
subjectto 􏰆 xj ≥1, forevery1≤i≤n
xj ∈{0,1}, forevery1≤j≤m
be the optimum value of this integer program; OPT be the value of the optimum solution to Set Cover.
Z∗ =OPT. IP
△ We cannot solve this integer program efficiently (why?).

LP relaxation: a bound for the value of the IP
LP relaxation for Set Cover: m
min 􏰆 wj xj
subjectto 􏰆 xj ≥1, forevery1≤i≤n j :ei ∈Sj

LP relaxation: a bound for the value of the IP
LP relaxation for Set Cover: m
min 􏰆 wj xj
subjectto 􏰆 xj ≥1, forevery1≤i≤n j :ei ∈Sj
􏰉 Every feasible solution to the original IP is a feasible solution to the LP relaxation.
􏰉 The value of any feasible solution to the original IP is the same in the LP (the objectives are the same).
􏰉 Let Z∗ be the optimum value of the LP relaxation. LP
Z∗ ≤Z∗ =OPT LP IP

Rounding the solution to the LP
LP relaxation for Set Cover: n
min 􏰆 wj xj
subjectto 􏰆 xj ≥1, forevery1≤i≤n j :ei ∈Sj
􏰉 Let x∗ be an optimal solution to the LP relaxation.
􏰉 Let fi = # subsets Sj where element ei appears.
􏰉 Let f = max fi. 1≤i≤n
􏰂1, ifx∗j≥1/f xˆj= 0,ifx∗j<1/f Rounding yields a feasible solution to the original IP The collection of sets Sj with xˆj = 1 cover all the elements. 􏰉 Consider the optimal solution x∗ for the LP relaxation. 􏰉 Fix any element ei; recall that ei appears in fi subsets. 􏰉 Forsimplicity,relabelthesesubsetsasS1,S2,...,Sfi.Then the optimal solution satisfies the constraint x∗1 +x∗2 +...+x∗fi ≥1 Let x∗m be the maximum of x∗1,x∗2,...,x∗fi. Then x∗m ≥ f1 ≥ f1 i ⇒ Our rounding procedure guarantees that, for every element ei, at least one set Sj that covers ei is chosen. An f-approximation algorithm for Set Cover How far is the solution obtained by the rounding procedure above from to the optimal solution to Set Cover? 􏰉 We do not know OPT! 􏰉 But we have a bound for it: the value Z∗ of the LP relaxation! Recall that we set xˆj = 1 if and only if x∗j ≥ 1/f. Then 􏰆wjxˆj ≤ 􏰆wj(fx∗j) = f 􏰆wjx∗j = f·Z∗ ≤f·OPT LP Approximation algorithms Definition 8. An α-approximation algorithm for an optimization problem is a polynomial-time algorithm that, for all instances of the problem, produces a solution whose value is within a factor of α of the value of the optimal solution. 􏰉 α is the approximation ratio or approximation factor 􏰉 For minimization problems, α > 1.
􏰉 For maximization problems, α < 1. Example 1: the rounding procedure described on slide 53 gives an f-approximation algorithm for Set Cover: 􏰉 it can be completed in polynomial-time 􏰉 it always returns a solution whose value is at most f times the value of the optimal solution. Remark: if an element appears in too many sets (e.g., f = Ω(n)), this algorithm does not provide a good approximation guarantee. Example 2: a 2-approximation algorithm for VC is a polynomial-time algorithm that always returns a solution whose value is at most twice the value of the optimal solution. A 2-approximation algorithm for V C 􏰉 Let E = {e1,...,em} be the set of edges in the graph. 􏰉 Let Sj be the set of edges (ground elements) that are covered by vertex j. 􏰉 For every edge ei, fi = 2: ei appears in exactly two subsets 􏰉 Hence f = max fi = 2. 1≤i≤m 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts