# Problem Set 4
## Problem 1
### Prove `Partition Problem` in $NP$
A non-deterministic polynomial time machine can guess a partition of the number set and accept if the two subcollections after partition have equal sum.
### Prove `Subset Sum` is polynomial time reducible to `Partition Problem`
Given $S_s$ of positive intefers and another positive integer $t$ , suppose the total sum of numbers in collection $S_s$ is $T$ , Let $S_p$ = $S_s \cup {T – 2t}$.
$$
S_p=
\begin{cases}
S_s & \text{if } \quad T – 2t = 0\\
S_s \cup {T – 2t} & \text{if} \quad T – 2t > 0 \\
S_s \cup {2t – T} & \text{if} \quad T – 2t < 0
\end{cases}
$$
Obviously, $S_p$ is also collection of positive integers.
Now we prove that there is a subcollection of $S_s$ whose sum is equal to $t$ if and only if there is a partition of the collection $S_p$ into two subcollections whose sum are equal.
#### T = 2t
In this case $S_p = S_s$.
If there is a subcollection of $S_s$ whose sum is equal to $t$ , then the complement of this subcollection has sum $T - t$ = $2t - t$ = $t$. Thus collection $S_p$ can be partitioned into two subcollections whose sum are equal (both equalt to $t$).
If $S_p$ can be partitioned into two subcollections whose sum are equal, because total sum is $2t$ , then the sum of both subcollections must be equalt to $t$, thus there is a subcollection of $S_s$ whose sum is equal to $t$ .
#### T - 2t < 0
$S_p= S_s \cup {2t - T}$
The sum of $S_p$ is $T + (2t - T ) = 2t$.
If there is a subcollection of $S_s$ whose sum is equal to $t$, it is also subcollection of $S_p$ , the complement of this subcollection in $S_p$ also have sum $2t - t = t$. Thus collection $S_p$ can be partitioned into two subcollections whose sum are equal (both equalt to $t$).
If $S_p$ can be partitioned into two subcollections whose sum are equal, because total sum is $2t$ , then the sum of both subcollections must be equalt to $t$, The newly added element $2t - T$ is in one of the subcollections, all elements in another subcollection comes from $S_s$, thus there is a subcollection of $S_s$ whose sum is equal to $t$ .
#### T - 2t > 0
$$ S_p = S_s \cup {T – 2t}$$
The sum of $S_p$ is $T + (T- 2t ) = 2T- 2t$.
If there is a subcollection $S_1$ of $S_s$ whose sum is equal to $t$, the complement of it $S_2$ in $S_s$ is $T – t$ . Take $S_2$ as one subcollection in $S_p$ , the complement of $S_2$ in $S_p$ has sum $2T – 2t – (T – t)$ = $T – t$. Thus collection $S_p$ can be partitioned into two subcollections whose sum are equal (both equalt to $T – t$).
If $S_p$ can be partitioned into two subcollections whose sum are equal, because total sum is $2T – 2t$ , then the sum of both subcollections must be equalt to $T – t$, The newly added element $T-2t$ is in one of the subcollections, all elements in another subcollection denoted as $S_1$ comes from $S_s$, $S_1$ has sum $T- t$ , the complement of $S_1$ in $S_s$ has sum $T – (T – t) = t$, thus there is a subcollection of $S_s$ whose sum is equal to $t$ .
#### Summary
Note that, we only add 0 or 1 element to $S_s$ to construct $S_p$. Thus, the reduction takes polynomial time. Combined with the proof in 3 cases, we have proven that NP-Complete problem `Subset Sum` is polynomial time reducible to `Partition Problem`.
Because “Partition Problem` is in $NP$ and there is one NP-Complete problem that is polynomial time reducible to `Partition Problem`, `Partition Problem` is also NP-complete.
## Problem 2
### a
Every node is either itself in K or has an incoming edge from some node in K.
For node $u$, apart from $v$, there is no other node that has edge entering it. Only $v$ has edge $(v,u)$ entering it. Thus, node $u$ is either itself in K or has an incoming edge from $v$ in K. In another word, either $u$ in $K$ or $v$ in $K$. Because there is edge $(u, v)$ , thus we can’t have both $u$ and $v$ in $K$.
For node $v$, with the same reasoning, it also requires that either $u$ or $v$ in $K$ but not both.
Thus, any kernel of G must include exactly one of the two nodes $u,v$.
### b
For any kenel $K$, suppose there is no node $x$ in $K$ (distinct from u,v,w) that has an edge to any of the three nodes $u, v, w$.
For node $u$, apart from $w$, there is no other possible node in $K$ that has edge entering it. Because every node is either itself in K or has an incoming edge from some node in K. Thus either $u$ or $w$ in $K$. Because there is edge $(w, u)$ , thus we can’t have both $u$ and $w$ in $K$.
For node $v$, with the same reasoning, either $v$ or $u$ in $K$, but not both.
For node $w$, with the same reasoning, either $w$ or $v$ in $K$, but not both.
Thus, we need to satisfy the following 3 conditions.
1. either $u$ or $w$ in $K$, but not both.
2. either $v$ or $u$ in $K$, but not both.
3. either $w$ or $v$ in $K$, but not both.
If $u$ in $K$, then from 1 $w$ not in $K$ , from 2 $v$ not in $K$, which is a contradiction to 3.
If $u$ not in $K$, then from 1, $w$ in $K$ , from 2 $v$ in $K$, which is a contradiction to 3.
Thus, in either case, we have a contradiction. And the assumption doesn’t hold. So we have proven that any kernel of G must contain some node x (distinct from u,v,w) that has an edge to at least one of the three nodes u,v,w.
### c
#### Graph construction from 3SAT formula
For a fomula $\phi$ in CNF, we can construct the graph $G$ as follows.
1. For each variable $x$ in $\phi$ , add edge $x \rightarrow \neg x $ and $\neg x \rightarrow x $ to $G$.
2. For each clause $\eta$ , add the following cycle $ \eta_u \rightarrow \eta_v \rightarrow \eta_w \rightarrow \eta_u$ to $G$ and for each literal $l$ in $\eta$, add the 3 edges $l \rightarrow \eta_u$ , $l \rightarrow \eta_v$ , $l \rightarrow \eta_w$ to $G$.
#### Prove: if $\phi$ is satisfiable, then $G$ has a kernel
if $\phi$ is satisfiable, for each variable $x$, if the solution assigns $x$ true, then select the vertex $x$ in $G$, if the solution assigns $x$ false, then select the vertex $\neg x$ in $G$.
Due to our selection rule, either vertex $x$ or $\neg x$ is selected. The kernel requirment for the graph part constructed in step 1 satisfied.
For each clause, at least 1 literal makes true, correspondingly, at least 1 selected literal vertex has an edge to one of the 3 vertexes in the clause cycle in $G$. If there is only 1 clause vertex having literal vertex entering it, we can select the next clause vertex of this clause vertex to $G$. Thus, The kernel requirment for the graph part 2 constructed in step 2 satisfied.
In summary, we can take the corresponding literal vertex to the 3SAT assignment plus additional clause vertex if necessary to form a kernel for $G$.
#### Prove: if $G$ has a kernel, then $\phi$ is satisfiable
If $G$ has a kernel $K$, for each variable $x$, from proof of part $a$, we know that either $x$ or $\neg x$ in $K$. If $x$ in $K$, we assign $x$ true in $\phi$ , if $\neg x$ in $K$, we assign $x$ false in $\phi$. For each clause $\eta$, from proof of part b, we know that $K$ contains some literal node $n$ that has an edge to at least one of the 3 nodes $\eta_u, \eta_v, \eta_w$ . According to our graph construction rule, the corresponding literal belongs to the clause $\eta$, in another word, at least 1 clause is true under the assignment corresponding to the kernel. Thus, each clause is satisfied under the assignment. Thus $\phi$ is satisfiable.
#### Summary
Obviously, the graph construction takes polynomial time. And we have proven that the formula is satisfiable if and only if the corresponding graph has a kernel. Thus, 3SAT is polynomial time reducible to directed graph kernel problem. In addition, a non-deterministic polynomial time machine can guess a subset of the vertices and then verify whether it is a kernel. The kernel problem is in $NP$. Thus, it is NP-complete to determine whether a given directed graph has a kernel.
## Problem 3
### 1
The optimal graph is connected. If there is a cycle in the optimal graph, we can remove one edge from the cycle and the resulting graph is still connected and have all the cities in the optimal graph. Thus, the resulting graph still meets the requirement but with less total distance (because all distances are positive). This is a contradiction to the fact that the optimal graph has minimum total distance.
Thus, the optimal graph is connected and have no cycle, it is a tree.
### 2
Given an integer $k$, a set N={1,…,n} of n cities, a subset M $\subseteq$ N of mandatory cities (the rest are optional), and the pairwise distances d(i,j)>0, 1≤ i,j ≤n between the cities, , which are assumed to be positive integers and symmetric.
Is there a tree $H = (V, E)$ that includes all the mandatory cities and has total distance at most $k$ ?
### 3
Suppose the total distance of all city pairs are $D$. The optimal value can be obtained using binary search with initial interval [0, D]. In each iteration, invoke the decision version subroutin to determine to continue the search in which half of the interval. The search ends until the interval contains only one integer which is the optimal value.
Suppose the input is encoded in $m$ bits, obviously $D \leq 2^m$ . The binary search will have $\log(D) \leq \log(2^m) \leq m$ iterations. In each iteration, it takes polynomial time to call the decision version subroutin. The total time taken is still polynomial.
“`
low = 0
high = D
while low < high: k = floor((high + low) / 2) if there is a tree that includes all the mandatory cities and has total distance at most k: high = k else: low = k + 1 return (low) ``` ### 4 #### `decision version of the Steiner tree problem` in NP Because given a graph H=(V,E) , we can verify in polynomial time whether the graph is a tree, including all the mandatory cities and the total distances is less than or equal to $k$. Thus decision version of the Steiner tree problem is NP. #### Prove Node Cover is polynomial time reducible to decision version of the Steiner tree problem where all the distances are 1 or $\infty$ Given a graph $G(V, E)$ in the vertex cover problem, we can construct the input for `Steiner tree` as follows. There are $N = $ $|V| + |E| $ cities. Each vertex in $V$ is taken as a city and each edge in $E$ is also taken as a city. The edge cities are mandatory. Construct the distances matrix $d[N][N]$ as follows. ``` initially set all d[i][j] = infinity for u in V: for v in V: d[u][v] = 1 for e in E: u, v are the two vertices of e d[u][e] = d[e][u] = 1 d[v][e] = d[e][v] = 1 ``` #### Prove if there is a tree that includes all the mandatory cities have size $k + |E| - 1$ then there is a vertex cover of size $k$ Let a tree that includes all the mandatory cities is $T(V_t, E_t)$. Suppose the total distance is $k + |E| - 1$.Because distance between the two cities is 1, thus $|V_t| = k + |E| $ Because any edge city $e$ in $V_t$ is connected to only its two vertices $u$ and $v$. Thus either $u$ or $v$ will be in $V_t$. The edges are all covered by the vertex cities in $V_t$. Thus, the vertex cities in $V_t$ is a vertex cover of $G$ which has size $|V_t| - |E|$ $= k + |E| - |E| = k$ . #### Prove if there is a vertex cover of size $k$ then there is a tree that includes all the mandatory cities have size $k + |E| - 1$ Let $V_c$ be a node cover in $G$. Because `d[u][v]=1` for all pairs of node u and v. Thus, we can first construct a tree spanning $V_c$ , then add the edges to the edge city to include all the manatory edge cities. Because `d[u][e]=1` for each node u and its edge. Thus, we can add additional $|E|$ edges to form the tree that includes all mandatory cities. The tree will have $k + |E| - 1$ edges, thus the total distance is $k + |E| - 1$. #### Summary Thus there is a vertex cover in $G$ of size at most k if and only if there is a tree that includes all the mandatory cities at most $k + |E| - 1$ with the cities, mandatory cities and distances given above. Obvious, it takes polynomial time to conduct the reduction. Thus Node Cover is polynomial time reducible to decision version of the Steiner tree problem where all the distances are 1 or $\infty$. From above, we have proven that the decision version of the Steiner tree problem is NP-complete even if all the distances are 1 or $\infty$. ## Problem 4 ### (a) A formula F in DNF is satisfiable if and only if there is at least one of its clauses are satisfiable. The clause is satisfiable if and only if no such variable that itself and its negation appear in the clause at the same time. From this, we have the following algorithm to decide if a given formula in DNF is satisfiable. The algorithm scans all the clauses in the worst case. Obviously, it takes polynomial time. Thus the Satisfiability problem for Boolean formulas in DNF is in P. ``` Given formula F in DNF for each clause in F: if there is a variable and its negation both appear in the clause: continue else: return true return False ``` ### (b) We can prove $DNF-TAUT$ is coNP-complete by proving that its complement $\overline{DNF-TAUT} = $ {DNF formula F |there is an assignment that makes F false \} is NP-complete. Because a non-deterministic polynomial time machine can guess an assignment and verify whether the assignment makes the formula false, $\overline{DNF-TAUT}$ is in NP. For a formuale $F$ in CNF, we can convert $\neg F$ to DNF in polynomial time. $F$ is satisfiable if and only if there is an assignment that makes $\neg F$ false. Thus, $SAT$ is polynomial time reducible to $\overline{DNF-TAUT}$. Thus, $\overline{DNF-TAUT}$ is NP-complete, $DNF-TAUT$ is coNP-complete. ## Problem 5 ### 1 #### NP for $\Pi_1$ $|y|≤p(|x|)$ , A non-deterministic polynomial time machine can guess the optimal value $opt$ for $\Pi_1$ and check whether $f_1(x, y) = opt$ . #### CoNP for $\Pi_1$ The complement problem is `Is y not an optimal solution for x`. $|y|≤p(|x|)$, A non-deterministic polynomial time machine can guess a candidate $y'$ , verify that $S_1(x, y')$ holds, compute $f_1(x, y')$ , verify that $f_1(x, y') < f_1(x, y)$. #### NP for $\Pi_2$ $|y|≤p(|x|)$ , A non-deterministic polynomial time machine can guess the optimal value $opt$ for $\Pi_2$ and check whether $f_2(x, y) = opt$ . #### CoNP for $\Pi_2$ The complement problem is `Is y not an optimal solution for x`. $|y|≤p(|x|)$, A non-deterministic polynomial time machine can guess a candidate $y'$ , verify that $S_2(x, y')$ holds, compute $f_2(x, y')$ , verify that $f_2(x, y') > f_1(x, y)$.
### 2
### $\Pi_1$
Decision problem:
Given instance $x$ in $\Pi_1$ and value $k$ is there a solution $y$ for $ x$ and $f_1(x, y) \leq k$
#### NP
$|y|≤p(|x|)$, A non-deterministic polynomial time machine can guess a candidate $y’$ , verify that $S_1(x, y’)$ holds, compute $f_1(x, y’)$ , verify that $f_1(x, y’) \leq k$.
#### CoNP
$|y|≤p(|x|)$, A non-deterministic polynomial time machine can guess a candidate $y’$ , verify that $S_1(x, y’)$ holds, verify that it is optimal (NP proved in part 1), and verify that $f_1(x, y’) \gt k$.
### $\Pi_2$
Decision problem:
Given instance $x$ in $\Pi_2$ and value $k$ is there a solution $y$ for $ x$ and $f_2(x, y) \geq k$
#### NP
$|y|≤p(|x|)$, A non-deterministic polynomial time machine can guess a candidate $y’$ , verify that $S_2(x, y’)$ holds, compute $f_2(x, y’)$ , verify that $f_2(x, y’) \geq k$
#### CoNP
$|y|≤p(|x|)$, A non-deterministic polynomial time machine can guess a candidate $y’$ , verify that $S_2(x, y’)$ holds, verify that it is optimal (NP proved in part 1), and verify that $f_2(x, y’) \lt k$.