Appendix B
Web addendum: NEXP ACC0
Very recently, Williams resolved “Frontier 2.1” from Chapter 14 of our book, by showing that NEXP ACC0. This exciting result uses many previous results described in the book, and suggests an approach for further lower bounds, and so we believe many instructors would want to teach it. We will add a writeup of the result to a future printing (giving a proof for the Yao-Beigel-Tarui Theorem currently stated in Chapter 14, and giving the proof of Williams’s lower bound in Chapter 22). In the meantime we provide this web addendum in the hope it will be helpful for instructors and students. In Section B.1 we present Williams’s Theorem, and in Section B.2 we prove the Yao-Beigel-Tarui Theorem that Williams uses in his result.
B.1 NEXP ACC0
We recall the class ACC0(m1,m2,…,mk) (Definition 14.3) consists of all functions com-
puted by polynomial size circuits of constant depth with AND, OR, NOT, and M ODm1 , . . . , M ODmk gates, where a MODm gate outputs 0 on inputs x1,…,xk if ki=1 xi = 0 (mod m) and outputs 1 otherwise. The class ACC0 consists of the union of ACC0 (m1 , . . . , mk ) for
every finite k-tuple (m1, . . . , mk). While strong lower bounds were shown in Chapter 14
for ACC0(p) for any prime p, very little was known about ACC0 or even ACC0(6) = ACC0(2,3). Allender and Gore [AG94] gave lower bounds for a uniform variant of ACC0
(see Chapter 14 notes). But nothing non-trivial was known for the non-uniform class.
The proof uses many of the results presented in this book, including the structural properties of ACC0 (Theorem 14.11, which itself uses the proof of Toda’s Theorem 17.14), the non-deterministic time hierarchy theorem (Theorem 3.2), and via the “easy witness” method of Lemma 20.20 also uses IP = PSPACE (Theorem 8.19) and derandomization results (Theorem 20.7). The philosophy behind the proof is somewhat dual to the natural proofs paradigm. While natural proofs say that certain classes of lower bounds against a class C imply a meta-algorithm that can distinguish the truth table of a circuit in C from a random function, Williams’s proof gives a fairly general way to translate a different meta- algorithm for C (a slightly non-trivial circuit satisfiability algorithm) into a lower bound showing that NEXP C. That said, Williams’s proof techniques, which include clever usages of diagonalization and proof by contradiction, do not fall under the natural proof paradigm (though we do not know if they can be “naturalizable”). Indeed, it needs to be investigated if Williams’s paradigm could extend to showing lower bounds higher complexity
Theorem B.1 (Williams, 2010) NEXP ⊆ ACC0
460 B Web addendum: NEXP ACC0
classes such as TC0, NC1 (and indeed maybe even P/poly) which are widely conjectured to contain pseudorandom functions, and hence cannot be separated from higher classes via natural proofs.
B.1.1 Non trivial satisfiability algorithm for ACC0 circuits
First, we explain the property of ACC0 (not shared by any higher class, as far as we know) that allows this lower bound: the circuit satisfiability problem for ACC0 circuits is solvable in time that is slightly better than trivial.
Theorem B.2 For every depth d there is a δ = δ(d) > 0 and an algorithm A with the following property. Given an instance of CKT-SAT which is a depth-d ACC0 circuit with n-bit input and up to 2nδ gates, the algorithm solves it in 2n−nδ time. ♦
The proof relies on the following characterization of ACC0 circuits, which was stated without proof in slightly different terms in Theorem 14.11 and proven below in Section B.2 below. We define a SYM+ circuit of degree d and size s to be a pair (P, Θ) such that P is an n-variable degree-d multilinear polynomial over the integers with coefficients of magnitude at most s, and Θ is a function from Z to {0,1}. (In Theorem 14.11 we described this as depth 3 circuit with a symmetric gate on top and AND gates of fanin at most s below it.) A SYM+ circuit (P, Θ) computes a function f : {0, 1}n → {0, 1} if f (x) = Θ(P (x)) for every x ∈ {0, 1}n. The polynomial P is represented as the list of non-zero coefficients of its monomials, and the function Θ is represented as the list of values on which it outputs 1. (Note that for every x ∈ {0, 1}n, |P (x)| ≤ s · nd, and so this list need not be longer than O(s · nd).)
Theorem B.3 ([Yao90, BT91]) There is an algorithm that given any ACC0 n-variable circuit C of depth d and size s runs in spolylog(s) and outputs a SYM+ circuit (P, Θ) of degree at most polylog(s) and size at most spolylog(s) such that Θ(P(x)) = C(x) for every x ∈ {0,1}n. (The implicit exponents in the polylog notation depends on d.) ♦
Proof of Theorem B.2 from Theorem B.3: . Let d ∈ N, and we choose δ as some function of d to be specified later. We’re given an ACC0 circuit C of n inputs and size s ≤ 2nδ and need to decide in 2n−nδ whether or not C(x) = 0 for every x ∈ {0,1}n. Let C be the circuit on n = n−2nδ inputs such that C(x1,…,xn) = ∨xn+1,…,xnC(x1,…,xn).
Clearly C is a depth d+1 ACC0 circuit of at most s ≤ 22nδ size such that C is satisfiable if and only if C is. We will evaluate C on all of its 2n = 2n−2nδ inputs and check if one of
the outputs happens to be 1. While naively doing these evaluations would cost s ·2n 2n, it turns out that for ACC0 it’s possible to amortize the work and perform all 2n evaluations at a cost of (2n + s polylog(s)) poly(n) which will be smaller than 2n−nδ for our setting of parameters.
Specifically, we choose δ to be a small enough function of d so that Theorem B.3’s transformation of C to C of the form C = Θ(P(x)) takes time less than, say, 2√n. We then use the following useful observation on evaluating polynomials:
Lemma B.4 There is an algorithm that given an integer multilinear polynomial P (x1, . . . , xn) =
n Pˆ(y) n xyi in the form of the array Pˆ : {0, 1}n → Z, outputs the evaluations y∈{0,1} i=1 i
of P on all inputs in {0, 1}n in time 2n poly(n, log s) where s = maxy∈{0,1}n |Pˆ(y)|. ♦
Lemma B.4 is proven via a simple but clever dynamic programming algorithm, somewhat reminiscent of the Fast Fourier Transform, see Exercise B.1. It immediately concludes the proof of Theorem B.2 since we can compute the polynomial P in time 2√n, and then evaluate all its values and apply Θ to them in time 2n poly(n) 2n−nδ .
B.1 NEXP ACC0 461 B.1.2 Proof of Theorem B.1
We now turn to proving Theorem B.1. We will assume that NEXP ⊆ ACC0 and show that NTIME(2n/n10) ⊆ NTIME(2n−nδ) for some δ > 0, which contradicts the non-
deterministic hierarchy theorem (Theorem 3.2). We will use the fact that every language L in NTIME(2n/n10) reduces to SUCCINCT-SATn, the problem in which one has to determine the satisfiability of a 3CNF Boolean formula φ with 2n clauses and 2n variables that is described in succinct form using a circuit of size poly(n). This is a circuit on n inputs that, given j in the set {0, 1}n (which we identify with the numbers in [2n]), outputs the description of the jth clause of φ: indices var1(j),var2(j),var3(j) of the three variables in clause j, and three bits neg1(j),neg2(j),neg3(j) that are 1 iff the corresponding variable appears negated in the clause. Thus the jth clause can be thought of as
Xvari(j) ⊕ negi(j), (1) i=1,2,3
where ⊕ denotes parity as usual. For every L ∈ NTIME(2n/n10), given any input x for L one can write in n5 time a circuit Cx of size n5 such that the instance of SUCCINCT- SATn represented by Cx is satisfiable iff x ∈ L. See Problem 6.18 in Chapter 6 and the discussion in Section 5.4 and also Problem B.4. (This reduction uses the Cook-Levin proof of NP-completeness of SAT; the fact that NTIME(2n/n10) can be reduced to satisfiability of instances of size 2n uses the efficient simulation of Turing Machines using oblivious Turing Machines in Chapter 1.) We will denote the formula encoded by C as φ(C).
Before we describe the algorithm we record two useful implications of our assumptions that NEXP ⊆ ACC0. First, one immediate corollary of NEXP ⊆ ACC0 is that P ⊆ ACC0. In particular, the CIRCUIT-EVAL problem of evaluating a Boolean circuit on a given input is in ACC0, which implies that there is some absolute constants c,d0 such that for every Boolean circuit C of size s there exists an equivalent ACC0 circuit C of size at most sc and depth at most d0. (The circuit C can be obtained by hardwiring the constants corresponding to the description of C into the ACC0 circuit for CIRCUT-EVAL.) Of course, this does not necessarily mean that there is an efficient way to find C given C, but nonetheless the observation above will be very useful for us below.
We say that SUCCINCT-SATn has succinct witnesses if the satisfying assignment is also representable as a small circuit. In other words, there is some constant c > 0 such that for every circuit C that is a YES instance for SUCCINCT-SATn there exists a circuit D of size at most nc that encodes the satisfying assignment for the formula encoded by C. That is, D has n inputs, and the assignment D(0n), · · · , D(1n) satisfies φ(C). It turns out that under our assumption SUCCINCT-SATn has succinct witnesses:
Lemma B.5 If NEXP ⊆ P/poly then SUCCINCT-SATn has succinct witnesses. ♦
Proof: This is implicit in the proof of Lemma 20.20 which showed that under this assump- tion NEXP = EXP— the EXP algorithm actually worked by searching for an assignment encoded by a small circuit. We briefly recall the proof here. If this was false we would get an infinite sequence of circuits {Cn} that encode satisfiable formulas without a succinct sat- isfying assignment. Then one can guess these assignments to be used as a hard functions to obtain a Nisan-Wigderson style derandomization of MA in NEXP, but since under these assumption MA = EXP we’d be able to use this to diagonalize against circuits of any fixed polynomial size.
We now turn to the algorithm. We’ll choose δ = δ(10d0) as in the statement of Theo- rem B.2 where d0 is the constant above such that every Boolean circuit has an equivalent depth-d0 ACC0 circuit. We are given an n-bit input Boolean circuit C, and need to decide in non-deterministic 2n−nδ -time whether or not the Formula φ(C) is satisfiable. Under our assumption that NEXP ⊆ ACC0 for every input C, φ(C) is satisfiable if and only if there are two n-input ACC0 circuits C and D of poly(n) size and depth d0 such that (i) C is equivalent to C and (ii) D encodes a satisfying assignment to φ(C) = φ(C). Thus our algorithm will simply non-deterministically guess C and D and then check the conditions (i) and (ii) in 2n−nδ poly(n)-time.
462 B Web addendum: NEXP ACC0
The condition (ii) can be checked in time 2n−nδ using the algorithm of Theorem B.2 since it amounts to checking whether the following ACC0 circuit G outputs 1 on all possible
inputs j ∈ {0, 1}n:
G(j) = D (vari(j)) ⊕ negi(j). (2)
i=1,2,3
Note that G is an ACC0 circuit of depth smaller than 2d0 + 10 since the clause descrip- tion vari(j), negi(j) for i = 1, 2, 3 is computed by the depth-d0 ACC0 circuit C, and the candidate assignment is computed by the depth-d0 ACC0 circuit D.
Thus all we need to show is how to check condition (i)— that C is equivalent to C. The algorithm of Theorem B.2 does not directly apply since C is a general Boolean (not necessarily ACC0) circuit. But we can still verify equivalence using non-determinism. The idea is that for every wire i of C we guess an ACC0 circuit Ci of depth d0 that is supposed to be equivalent to the value on the ith wire of C. That is, for every input x ∈ {0,1}n, Ci(x) outputs the same value as the value on the ith wire of C when invoked input x. (The circuit C is equal to the circuit corresponding to the output wire of C.)
If the ith wire in C is the result of the AND of the jth and kth wire then for every x ∈ {0,1}n it holds that Ci(x) = Cj(x)ANDCk(x) for all x. Similar condition holds for OR and NOT gates, as well as wires that are connected to the inputs. Now consider the ACC0 circuit E that outputs the AND of all those conditions for all wires i of C. Since all the circuits Ci are of depth d0, E has depth less than, say, d0 + 100, and size polynomial in the size of C. Moreover, if E(x) = 1 for every x, then by it’s easy to prove that for every i the circuit Ci is indeed equivalent to the ith wire of C. (This is just like the reduction from CKT-SAT to 3SAT of Lemma 6.11.) Thus by running the satisfiability algorithm of Theorem B.2 on 1 − E one can verify condition (ii) in time 2n−nδ , completing the description of our SUCCINCT-SATn algorithm. The overall (non-deterministic) running time of the algorithm is 2n−nδ poly(n).
Remark B.6
The proof above does not really require a full-fledged satisfiability algorithm for ACC0. It’s enough to come up with an approximate satisfiability algorithm that distinguishes between circuits that are unsatisfiable and those that have, say, at least 1/10 fraction of satisfying assignments. The main idea is that one can use a highly efficient variant of the scaled up PCP Theorem (Theorem 11.8) to ensure we can derive a contradiction to the non- deterministic hierarchy theorem even via an algorithm that on input a circuit C finds an assignment that approximately satisfies the formula φ(C). This allows to use circuits C, D in the proof that only agree with the circuits C,D on some large fraction of the inputs. Of course such an approximate satisfiability algorithm is easy to achieve in probabilistic polynomial time, and thus Williams’s result is another instance of the connection between derandomization and lower bounds explored in Chapter 20.
B.2 Simulating ACC0 circuits by low degree polynomials: Proof of Theorem B.3
The proof of Theorem B.3 uses the ideas of the proof of Toda’s Theorem (Theorem 17.14). The intuitive link is that Toda’s proof can be seen as a transformation for exponential-size constant depth circuits, since PH languages have such circuits that are DC-uniform. Here we will apply similar transformations on polynomial-size circuits.
Recall that Toda’s Theorem is proven in two steps. First we showed a randomized re- duction from PH to ⊕P (Theorem 17.17). Using the characterization of PH as constant depth DC uniform circuits, one can interpret this probabilistic reduction as a way of trans- forming an AC0 circuit to a depth three circuit with majority gate at the output level and MOD2 gates (the majority gate corresponds to the fact that this is a probabilistic reduction,
B.2 Simulating ACC0 circuits by low degree polynomials: Proof of Theorem B.3 463
note that the same transformation is used in the proof of Theorem 14.4— the Razborov- Smolensky lower bound for ACC0(q) ). The second step in Toda’s theorem is to transform this randomized reduction to ⊕P into a deterministic reduction to #P (Lemma 17.22)— this can be thought of as a way of removing the MOD2 gates and reducing the depth of the circuit to two at the expense of changing the majority function to a different symmetric function (i.e., a function that only depends on the number of its inputs that are one). We are going to prove Theorem B.3 by applying the first transformation to the ACC0 circuit to replace the OR and AND gates by small addition modulo-2 and low fan-in multiplication and then repeatedly applying the second transformation to “peel off” one by one the layers of the circuit until we are left with a SYM+ circuit. We now turn to presenting the actual proof.
Let C be a size-s depth-d ACC0 circuit C, and for concreteness we restrict to the case that C only uses MOD2, MOD3 and OR gates and the constant 1. (We leave the relatively straightforward extension to the case of general modula gates as Problem B.2; we omitted the NOT gate as it can be replaced with a MOD2 gate and similarly OR and MOD2 can replace the AND gate.) The proof will consist of applying several transformations to C, where in each step we modify C to an equivalent circuit C that is “simpler” in some specific sense.
Step 0: preliminary tidying up of the circuit. We can assume that C has the follow- ing properties without loss of generality, as it can be easily modified to satisfy them at a polynomial cost in the size and constant factor in the depth:
1. It is layered with gates at layer i only depending on inputs from gates at layer i − 1, where layer 1 is the bottom-most layer depending on only the inputs. Moreover, in every layer there is only one type of gates. (This can be ensured by using fan-in one OR, MOD2 or MOD3 gates as “dummy gates” that just copy the input.)
2. It is a tree in the sense that every gate has fan-out at most 1. (This can be ensured by duplicating gates at polynomial cost since the circuit has constant depth.)
We will maintain these invariants throughout the proof, applying the above transforma- tions whenever we modify the circuit.
Step 1: Reducing the fan-in of OR gates; adding symmetric gate at the output It turns out that if we are willing to add a symmetric function on top, we can use the following claim to ensure that the fan-in of all OR gates is at most polylogarithmic:
Claim: There is a set D of ACC0 circuits with the same gateset as C and of size at most s2 and depth at most 3d such that every OR gate in D ∈ D has fan-in at most O(log2 s) and for every input x ∈ {0,1}n, PrD∈R D[D(x) = C(x)] > 0.9. Moreover |D| ≤ sO(log2 s).
Proof of Claim: The proof uses the Valiant-Vazirani Lemma (Lemma 17.19), that
shows that given an OR gate over 2k inputs of the form OR(x1,…,x2k), if we pick h to be
2k a pairwise independent hash function from [2 ] to {0, 1} then for any nonzero x ∈ {0, 1} ,
k
with probability at least 1/(10k) it will hold that xi (mod 2) = 0. Thus if we pick
i:h(i)=1
t = 100k log s such hash functions h1, . . . , ht independently then with probability at least
1 − 1/(10s) the OR of x1, . . . , x2k will equal ORjt=1( i:hj (i)=1 xi (mod 2)), enabling us to take a union bound over all the at most s OR gates in C. Because we used the union bound we can reuse the same set of t hash functions for all the gates, and thus the set of circuits has cardinality at most |H|t where H is the cardinality of the set of pairwise independent hash functions on 2k inputs (we can assume 2k ≤ 2s since the fan-in of every OR gate is at most s). As noted in Section 8.2.2, there exists such a collection of cardinality at most 22k = O(s2), and thus the total number of circuits will be sO(log2 s).
An immediate corollary of the claim is that, writing D = {D1 , . . . , Ds }, for every x ∈ {0,1}n, C(x) = MAJ(D1(x),…,Ds(x)) where MAJ is the majority function.
464 B Web addendum: NEXP ACC0
Step 2: Moving to arithmetic gates and pushing multiplications down. We will now transform C so that all the gates will operate over the integers, meaning they take an integer (not necessarily 0/1) as input and output an integer as well. (Note that the inputs to the entire circuits are still 0/1 as before.) The output gate of the circuit C consists of an addition gate (which outputs an integer), followed by some function Θ : Z → {0, 1} such that the final output C(x) = Θ(C(x)) for every x. Thus the output gate can be viewed as a symmetric gate.
First, we use OR(x1,…,xk) = 1−x1 ···xk to replace OR gates with multiplication and addition gates over the integers. (Note that the fan-in of the multiplication gates will be at most O(log2 s).) Next we use the equation MODp(x1, . . . , xk) = ( ki=1 xi)p−1 (mod p) (implied by Fermat’s Little Theorem) to replace every MODp gate in C with k inputs with one addition gate and p − 2 multiplication gates (taking integer inputs) followed by a “MODp conversion” gate that given an integer input z outputs z (mod p).
We can also push all multiplication gates to the bottom of the circuit using the distribu- tive rule (i.e., (a + b)(c + d) = ac + bc + ad + cd) and the equation (x1 (mod p))···(xk (mod p)) = (x1 · · · xk ) (mod p). Thus all multiplication gates will only involve input vari- ables. Note that pushing the multiplication gates down can increase their fan-in from log2 s to at most logO(d) s. By adding “dummy gates” we will keep the invariant that every layer of the circuit only has one type of gate (addition, mod 2, mod 3, or multiplication).
The bottom line is that we get a circuit C of O(d) depth and size at most spolylog(s) over the integers involving only addition, mod 2 and mod 3 conversion gates, and mul- tiplication gates. Every layer in the circuit has only one type of gate, with multiplication at the very bottom and with fan-in at most polylog(s). The output gate of the circuit C is an addition gate (which outputs an integer), and there is some function Θ : Z → {0, 1} such that C(x) = Θ(C(x)) for every x. (The function Θ can be represented by a table of spolylog(s) size, as that is the maximum magnitude of a value that C can output on an input in {0, 1}n.)
Step 3: Peeling off layers. We now show how we can take the uppermost layer of the circuit C and remove it (at the cost of somewhat increasing its size and changing the function Θ), we repeat this again and again until there are no more mod 2 or mod 3 gates left, and hence all we are left is with a circuit with one addition gate at the top, and multiplication gates of polylog(s) fan-in at the bottom – a polylog(s)-degree multivariate polynomial over the integers. (Using the equation x = x2 for x ∈ {0, 1}, we can assume this polynomial is multilinear without loss of generality.)
To do this “layer removal” operation, suppose that the top layer of C has mod 3 gates (the case of mod 2 gates is symmetric, and addition gates can just be collapsed into the output addition gate anyway). Thus, the output gate of C has the form ki=1(yi (mod 3)) where yi is the value fed into the ith mod 3 gate of the top layer. (Note that the output gate outputs an integer, not necessarily in {0,1}, and that integer is fed into the function Θ to obtain a 0/1 answer.) We will use a family of univariate polynomials over the integers {Pt} such that Pt has degree at most t10 (with coefficients of size at most 10t) and such that Pt(y) (mod 3t) = y (mod 3). (These are known as Modulus amplifying polynomials, see also Lemma 17.22 and Problem B.3.) Taking t > log3 k we see that ki=1(yi (mod 3)) = ki=1(Pt(yi) (mod 3t)) = (ki=1 Pt(yi)) (mod 3t) (where the latter equality holds since the sum on the left hand side is smaller than 3t. Thus, by changing the function Θ(x) to Θ(x) = Θ(x (mod 3t)), and by pushing down the multiplication gates involved in computing Pt (at a cost of poly(t) = polylog(s) to the bottom fan-in), we get a circuit of the same form but with depth reduced by one.
Repeating Step 3 again and again we end up with a SYM+ circuit, one can also verify that all our transformations can be carried out in time polynomial in the size of the output circuit.
B.2 (a)
sort the arrays under a different order between after each one of the n stages of the algorithm.)
Extend the proof of Theorem B.3 to the case where the ACC0 circuit uses the gates M ODp1 , . . . , M ODpk
where k is some constant and the pi’s are distinct primes.
Exercises 465
coefficients of size at most s. That is, P(x1,…,xn) =
Exercises
B.1 (Yates 1937 via Williams 2010) Let P : Zn → Z be a multilinear polynomial over the integers with
(a) Let P0(x) = Pˆ(x) and Pi(x) = Pn(x)=P(x). Provethatforeveryx∈{0,1} ,
Pi(x) = Pi−1(x1,…,xi−1,0,xi+1,…,xn) + xiPi−1(x1,…,xi−1,1,xi+1,…,xn) .
(b) Give a 2n poly(n, s)-time algorithm that given the length 2n array Pˆ(0n), · · · , Pˆ(1n) outputs
the array P(0n),··· ,P(1n). (Hint: use dynamic programming)
(c) Show that one can implement the above algorithm on a multi-tape Turing machine. (Hint:
n Pˆ(y)n xyj . y∈{0,1} j=1 j
i Pˆ(y1,…,yi,xi+1,…,xn)i xyj . Note that y1 ,…,yi ∈{0,1} n j =1 j
(b) Extend the proof of Theorem B.3 to the general case where the ACC0 circuit uses the gates MODm1,…,MODmk where k is some constant and the mi’s can prime, prime powers, or composites.
B.3 Define the polynomials Si(x) as follows: S1(x) = 3×2 − 2×3 and Si(x) = S1(Si−1(x)).
(a) Prove that for every x,i,m if x (mod m) ∈ {0,1} then Pi(x) (mod m2t) = x (mod m).
(b) Use these polynomials to construct the collection of polynomials {Pt} needed in the proof of Theorem B.3. (Hint: use Fermat’s Little Theorem.)
B.4 Prove that there is some constant c such that for every L in NTIME(2n/nc), there is a polynomial- time computable function f such that x ∈ L iff f(x) ∈ SUCCINCT-SAT|x|.