THE SIGN-RANK OF AC0 ∗
ALEXANDER A. RAZBOROV† AND ALEXANDER A. SHERSTOV‡
Abstract. The sign-rank of a matrix A = [Aij] with ±1 entries is the least rank of a real
matrix B = [Bij] with AijBij > 0 for all i,j. We obtain the first exponential lower bound on the
sign-rank of a function in AC0. Namely, let f(x,y) = i=1,…,m j=1,…,m2(xij ∧ yij). We show
that the matrix [f(x, y)]x,y has sign-rank exp(Ω(m)). This in particular implies that Σcc ̸⊆ UPPcc, 2
which solves a longstanding open problem in communication complexity posed by Babai, Frankl, and Simon (1986).
Our result additionally implies a lower bound in learning theory. Specifically, let φ1, . . . , φr : {0, 1}n → R be functions such that every DNF formula f : {0, 1}n → {−1, +1} of polynomial size has the representation f ≡ sgn(a1φ1 + ··· + arφr) for some reals a1,…,ar. We prove that then r exp(Ω(n1/3)), which essentially matches an upper bound of exp(O ̃(n1/3)) due to Klivans and Servedio (2001).
Finally, our work yields the first exponential lower bound on the size of threshold-of-majority circuits computing a function in AC0. This substantially generalizes and strengthens the results of Krause and Pudl ́ak (1997).
Key words. Sign-rank; communication complexity; complexity classes Σcc , Πcc , and UPPcc ; 22
constant-depth AND/OR/NOT circuits.
AMS subject classifications. 03D15, 68Q15, 68Q17
1. Introduction. The sign-rank of a real matrix A = [Aij] with nonzero entries is the least rank of a matrix B = [Bij] with AijBij > 0 for all i,j. In other words, sign-rank measures the stability of the rank of A as its entries undergo arbitrary sign-preserving perturbations. This fundamental notion has been studied in contexts as diverse as matrix analysis, communication complexity, circuit complexity, and learning theory [40, 2, 4, 13, 14, 26, 32, 48, 51]. We will give a detailed overview of these applications shortly as they pertain to our work.
Despite its importance, progress in understanding sign-rank has been slow and difficult. Indeed, we are aware of only a few nontrivial results on this subject. Alon et al. [2] obtained strong lower bounds on the sign-rank of random matrices. No nontrivial results were available for any explicit matrices until the breakthrough work of Forster [13], who proved strong lower bounds on the sign-rank of Hadamard matri- ces and, more generally, all sign matrices with small spectral norm. Several extensions and refinements of Forster’s method were proposed in subsequent work [14, 15, 32]. Near-tight estimates of the sign-rank were obtained in [51] for all symmetric prob- lems,i.e.,matricesoftheform[D(xiyi)]x,y whereD:{0,1,…,n}→{−1,+1}is a given predicate and x, y range over {0, 1}n .
This paper focuses on AC0, a prominent class whose sign-rank has seen no progress in previous work. It will henceforth be convenient to view Boolean functions as mappings of the form {0, 1}n → {−1, +1}, where the elements −1 and +1 of the range represent “true” and “false,” respectively. The central objective of our study is to estimate the maximum sign-rank of a matrix [f(x,y)]x,y, where f : {0,1}n×{0,1}n →
∗An extended abstract of this article appeared in Proceedings of the 49th IEEE Symposium on Foundations of Computer Science (FOCS), pages 57–66, 2008.
†Department of Computer Science, University of Chicago and Steklov Mathematical Institute (razborov@cs.uchicago.edu). Supported by NSF grant ITR-0324906 and by the Russian Foundation for Basic Research.
‡Department of Computer Science, The University of Texas at Austin (sherstov@cs.utexas.edu).
2 A.A. RAZBOROV, A.A. SHERSTOV
{−1,+1} is a function in AC0. An obvious upper bound is 2n, while the best lower bound prior to this paper was quasipolynomial. (The quasipolynomial lower bound is immediate from Forster’s work [13] and the fact that AC0 can compute inner product modulo 2 on logc n variables, for every constant c > 1.) Our main result considerably tightens the gap by improving the lower bound to 2Ω(n1/3).
m m2
Theorem 1.1 (Main result). Let fm(x,y) = i=1 j=1(xij ∧ yij). Then the
matrix [fm(x, y)]x,y has sign-rank 2Ω(m).
It is not difficult to show that the matrix in Theorem 1.1 has sign-rank 2O(m log m),
i.e., the lower bound that we prove is almost tight. (See Remark 6.1 for details.) More- over, Theorem 1.1 is optimal with respect to circuit depth: AC0 circuits of depth 1 and 2 lead to at most polynomial sign-rank. Indeed, the function mi=1(xi ∧yi) which is universal in this context possesses the matrix representation [sgn(1/2 − i xiyi)]x,y and thus has sign-rank at most m + 1.
Our main result states that AC0 contains matrices whose rank is rather stable in that it cannot be reduced below 2Θ(n1/3) by any sign-preserving changes to the matrix entries. We proceed to discuss applications of this fact to communication complexity, learning theory, and circuits.
1.1. Communication complexity. The study of sign-rank is synonymous with the study of unbounded-error communication complexity, a rich model introduced by Paturi and Simon [40]. Fix a function f : X × Y → {−1, +1}, where X and Y are some finite sets. Alice receives an input x ∈ X, Bob receives y ∈ Y, and their objective is to compute f(x,y) with minimal communication. The two parties each have an unlimited private source of random bits which they can use in deciding what messages to send. Their protocol is said to compute f if on every input (x,y), the output is correct with probability greater than 1/2. The cost of a protocol is the worst-case number of bits exchanged on any input (x,y). The unbounded-error communication complexity of f, denoted U(f), is the least cost of a protocol that computes f.
The unbounded-error model occupies a special place in the study of communica- tion because it is more powerful than almost any other standard model (deterministic, nondeterministic, randomized, quantum with or without entanglement). More pre- cisely, the unbounded-error complexity U(f) can be only negligibly greater than the complexity of f in any of these models—and often, U(f) is exponentially smaller. We defer exact quantitative statements to Appendix A. The power of the unbounded- error model resides in its very liberal acceptance criterion: it suffices to produce the correct output with probability even slightly greater than 1/2 (say, by an exponen- tially small amount). This contrasts with the more familiar, bounded-error models, where the correct output is expected with probability at least 2/3.
Another compelling aspect of the unbounded-error model is that it has an exact matrix-analytic formulation. Let f : X × Y → {−1, +1} be a given function and M = [f (x, y)]x∈X, y∈Y its communication matrix. Paturi and Simon [40] showed that
U(f) = log2(sign-rank(M)) ± O(1).
In other words, unbounded-error complexity and sign-rank are essentially equivalent notions. In this light, our main result gives the first polynomial lower bound on the unbounded-error complexity of AC0:
Corollary 1.2 (Unbounded-error communication complexity of AC0). Let m m2
fm(x,y)= i=1 j=1(xij ∧yij). Then U(fm)=Ω(m).
THE SIGN-RANK OF AC0 3
Corollary 1.2 solves a longstanding problem in communication complexity. Specif-
ically, the only models for which efficient simulations in the unbounded-error model
were unknown had been developed in the seminal paper by Babai, Frankl, and Si-
mon [3]. These models are the communication analogues of the classes PH and
PSPACE. Babai et al. asked [3, p. 345] whether Σcc ⊆ UPPcc. Forster [13] made sub- 2
stantial progress on this question, proving that PSPACEcc ̸⊆ UPPcc. We resolve the original question completely: Corollary 1.2 implies that Πcc ̸⊆ UPPcc and hence (since
UPPcc is closed under complementation) Σcc ̸⊆ UPPcc. See Section 7 for detailed back- 2
ground on these various complexity classes.
1.2. Learning theory. In a seminal paper [55], Valiant formulated the probably approximately correct (PAC) model of learning, now a central model in computational learning theory. Research has shown that PAC learning is surprisingly difficult. (By “PAC learning,” we shall always mean PAC learning under arbitrary distributions.) Indeed, the learning problem remains unsolved for such natural concept classes as DNF formulas of polynomial size and intersections of two halfspaces, whereas hardness results and lower bounds are abundant [22, 24, 28, 12, 29, 27].
One concept class for which efficient PAC learning algorithms are available is the class of halfspaces, i.e., functions f : Rn → {−1, +1} representable as
f(x)≡sgn(a1x1 +···+anxn −θ)
for some reals a1,…,an,θ. Halfspaces constitute one of the most studied classes in computational learning theory [46, 37, 34, 5] and a major success story of the field. Indeed, a significant part of computational learning theory attempts to learn rich concept classes by reducing them to halfspaces. The reduction works as follows. Let C be a given concept class, i.e., a set of Boolean functions {0, 1}n → {−1, +1}. One seeks functions φ1, . . . , φr : {0, 1}n → R such that every f ∈ C has a representation
f (x) ≡ sgn(a1 φ1 (x) + · · · + ar φr (x))
for some reals a1, . . . , ar. This process is technically described as embedding C in half- spaces of dimension r. Once this is accomplished, C can be learned in time polynomial in n and r by any halfspace-learning algorithm.
For this approach to be practical, the number r of real functions needs to be reasonable (ideally, polynomial in n). It is therefore of interest to determine what natural concept classes can be embedded in halfspaces of low dimension [4, 27]. For brevity, we refer to the smallest dimension of such a representation as the dimension complexity of a given class. Formally, the dimension complexity dc(C) of a given class C of functions {0, 1}n → {−1, +1} is the least r for which there exist real functions φ1,…,φr : {0,1}n → R such that every f ∈ C is expressible as
f (x) ≡ sgn(a1 (f )φ1 (x) + · · · + ar (f )φr (x)) (1.1)
for some reals a1(f), . . . , ar(f) depending on f only. To relate this discussion to sign-rank, let MC = [f(x)]f∈C, x∈{0,1}n be the characteristic matrix of C. A moment’s reflection reveals that the identity (1.1) is yet another way of saying that MC has the same sign pattern as the matrix [ri=1 ai(f)φi(x)]f,x of rank at most r, whence the dimension complexity of a concept class is precisely the sign-rank of its characteristic matrix. Indeed, the term “dimension complexity” has been used interchangeably with sign-rank in the recent literature [53, 48], which does not lead to confusion since concept classes are naturally identified with their characteristic matrices.
2
4 A.A. RAZBOROV, A.A. SHERSTOV
Thus, the study of sign-rank yields nontrivial PAC learning algorithms. In par- ticular, the current fastest algorithm for learning polynomial-size DNF formulas, due to Klivans and Servedio [26], was obtained precisely by placing an upper bound of 2O ̃(n1/3) on the dimension complexity of that concept class, with the functions φi corresponding to the monomials of degree up to O ̃(n1/3).
Klivans and Servedio also observed that their 2O ̃(n1/3) upper bound is best possible when the functions φi are taken to be the monomials up to a given degree. Our work gives a far-reaching generalization of the latter observation: we prove the same lower bound without assuming anything whatsoever about the embedding functions φi. That is, we have:
Corollary 1.3 (Dimension complexity of DNF). Let C be the set of all read-once (hence, linear-size) DNF formulas f : {0,1}n → {−1,+1}. Then C has dimension complexity 2Ω(n1/3).
Proof. Let fm(x, y) be the function from Theorem 1.1, where m = ⌊n1/3⌋. Then for any fixed y, the function fy(x) = ¬fm(x,y) is a read-once DNF formula.
Learning polynomial-size DNF formulas was the original challenge posed in Valiant’s paper [55]. More than twenty years later, this challenge remains a central open problem in computational learning theory despite active research, e.g., [7, 54, 26]. To account for this lack of progress, several hardness results have been obtained based on complexity-theoretic assumptions [24, 1]. Corollary 1.3 complements that line of work by exhibiting an unconditional, structural barrier to the efficient learning of DNF formulas. In particular, it rules out a 2o(n1/3)-time learning algorithm based on dimension complexity.
While restricted, the dimension-complexity paradigm is quite rich and captures many PAC learning algorithms designed to date, with the notable exception [18, 6] of learning low-degree polynomials over GF(p). Furthermore, it is known [23, p. 124] that an unconditional superpolynomial lower bound for learning polynomial-size DNF formulas in the standard PAC model would imply that P ̸= NP; thus, such a result is well beyond the reach of the current techniques.
The lower bound in this work also points to the importance of generalizing the dimension complexity framework while maintaining algorithmic efficiency. Unfortu- nately, natural attempts at such a generalization do not work, including the idea of sign-representations that err on a small fraction of the inputs [12].
1.3. Threshold circuits. Recall that a threshold gate g with Boolean inputs x1,…,xn is a function of the form g(x) = sgn(a1x1 +···+anxn −θ), for some fixed reals a1,…,an,θ. Thus, a threshold gate generalizes the familiar majority gate. A major unsolved problem in computational complexity is to exhibit a Boolean function that requires a depth-2 threshold circuit of superpolynomial size, where by the size of a circuit we mean the number of gates.
Communication complexity has been crucial to the progress on this problem. Through randomized communication complexity, many explicit functions have been found [17, 16, 36, 47, 49] that require majority-of-threshold circuits of exponential size. This solves an important case of the general problem. Lower bounds for the unbounded-error model (or, equivalently, on the sign-rank) cover another important case, that of threshold-of-majority circuits. More precisely, Forster et al. [14] proved the following:
Lemma 1.4 ([14, Lemma 5]). Let f : {0, 1}n × {0, 1}n → {−1, +1} be a Boolean function computed by a depth-2 threshold circuit with arbitrary weights at the top gate
THE SIGN-RANK OF AC0 5
and integer weights of absolute value at most w at the bottom. Then the sign-rank of F = [f (x, y)]x,y is O(snw), where s is the number of gates.
Combined with our main result, this immediately gives the following.
Corollary 1.5 (Threshold circuits). In every depth-2 threshold circuit that m m2
computes the function fm(x,y) = i=1 j=1(xij ∧yij), the sum of the absolute values of the (integer) weights at the bottom level must be of magnitude 2Ω(m).
This is the first exponential lower bound for threshold-of-majority circuits com- puting a function in AC0. It substantially generalizes and strengthens an earlier result of Krause and Pudl ́ak [30, Thm. 2], who proved an exponential lower bound for threshold-of-MODr circuits (for any constant r 2) computing a function in AC0. Our work also complements exponential lower bounds for majority-of-threshold cir- cuits computing functions in AC0, obtained by Buhrman et al. [8] and independently in [49, 50].
1.4. Our proof and techniques. At a high level, we adopt the approach in- troduced in [51] in the context of determining the sign-rank of symmetric functions. That approach is based on Forster’s method [13] for proving lower bounds on the sign-rank in combination with the pattern matrix method [50].
In more detail, Forster [13] showed that a matrix with entries ±1 has high sign- rank if it has low spectral norm. Follow-up papers [14, 15] relaxed the assumptions on the entries of the matrix. We begin with a simple generalization of these results (Theorem 5.1), proving that in order to ensure high sign-rank, it suffices to require, along with low spectral norm, that most of the entries are not too small in absolute value.
In [50, 51] it was shown how to construct such matrices from a function g : {0, 1}n → R with the following properties:
(1) low-order Fourier coefficients of g are zero, i.e., g is orthogonal to all low- degree polynomials;
(2) g is not too small in absolute value on most inputs x ∈ {0, 1}n.
The entries of the matrix are the values of g repeated in certain patterns; the resulting matrix is referred to as the pattern matrix for g.
If the original function g is Boolean, then (2) is immediate, but (1) rarely holds. A way to fix the problem is to find a probability distribution μ on {0, 1}n which is orthogonalizing in the sense that the combined function g′(x) = g(x)μ(x) is orthogonal to all low-degree polynomials. But then care must be taken to ensure property (2) for the new function g′. In other words, μ must be nonnegligible on most inputs (smooth).
In [51], this program was carried out to obtain lower bounds on the sign-rank of symmetric functions. The existence of the required smooth orthogonalizing distribu- tion was established using a linear-programming dual interpretation of Paturi’s lower bounds [39] for the uniform approximation of symmetric functions.
In this paper we study the sign-rank of AC0 functions, and specifically the sign- rank of the matrix derived from the communication version of the Minsky-Papert function :
m 4m2 MPm(x) = xi,j.
i=1 j=1
Our proof relies on a linear-programming dual interpretation of the Minsky-Papert lower bound for the sign-representation of MPm. The construction of the smooth or-
6 A.A. RAZBOROV, A.A. SHERSTOV
thogonalizing distribution in this paper, which is the crux of the program, is unrelated to the corresponding step in [51] and requires considerable new ideas.
Having described our proof at a high level, we will now examine it in more detail, from the bottom up. Figure 1.1 illustrates the main components of our proof. A starting point in our study is an elegant result due to Minsky and Papert [34], who constructed a linear-size DNF formula that cannot be sign-represented by polynomials of low degree.
Second, we revisit a fundamental technique from approximation theory, the in- terpolation bound, which bounds a degree-d univariate polynomial p on an interval based on the values of p at d + 1 distinct points. By combining the interpolation bound with an adapted version of Minsky and Papert’s argument, we establish a key intermediate result (Lemma 3.4). This result concerns multivariate polynomials that have nonnegligible agreement with the Minsky-Papert function and constrains their behavior on a large fraction of the inputs.
We proceed by deriving a Fourier-theoretic property common to all low-degree multivariate polynomials on {0, 1}n: we show that their values on {0, 1}n can be con- veniently bounded in terms of their behavior on certain small subcubes (Lemma 3.2). In light of this Fourier-theoretic observation, our intermediate result on multivariate polynomials takes on a much stronger form. Namely, we prove that multivariate poly- nomials with any nontrivial agreement with the Minsky-Papert function are highly constrained throughout the hypercube (Theorem 3.6). With some additional work in Section 4, we are able to deduce the existence of a smooth distribution on {0,1}n with respect to which the Minsky-Papert function is orthogonal to all low-degree polynomials. This completes step 1 of the above program, as desired.
The techniques of our proof seem to be of independent interest. Multivariate polynomials on {0,1}n arise frequently in the complexity literature and pose a con- siderable analytic challenge. A solution that we introduce is to project a multivariate polynomial in several ways to univariate polynomials, study the latter objects, and recombine the results using Fourier analysis (see Section 3). To our knowledge, this
Main result
Pattern matrix Forster’s method generalized bound
STEP 1
Smooth orthogonalizing distribution
Multivariate approximation
Intermediate result
Subcube lemma
Minsky & Papert
Interpolation bound
Fig. 1.1: Proof outline.
THE SIGN-RANK OF AC0 7 approach is novel and shows promise in more general contexts.
2. Preliminaries. All Boolean functions in this paper are represented as map- pings {0, 1}n → {−1, +1}, where −1 corresponds to “true.” For x ∈ {0, 1}n, we define |x| = x1 +x2 +···+xn. The symbol Pd stands for the set of all univariate real poly- nomials of degree up to d. By the degree of a multivariate polynomial, we will always mean its total degree, i.e., the largest total degree of any monomial. The notation [n] refers to the set {1,2,…,n}. Set membership notation, when used in the subscript of an expectation operator, means that the expectation is taken with respect to the uniformly random choice of an element from the indicated set.
2.1. Matrix analysis. The symbol Rm×n refers to the family of all m × n matrices with real entries. The (i,j)th entry of a matrix A is denoted by Aij. We frequently use “generic-entry” notation to specify a matrix succinctly: we write A = [F(i,j)]i,j to mean that the (i,j)th entry of A is given by the expression F(i,j). In most matrices that arise in this work, the exact ordering of the columns (and rows) is irrelevant. In such cases we describe a matrix by the notation [F (i, j)]i∈I, j∈J , where I and J are some index sets.
Let A = [Aij] ∈ Rm×n be given. We let ∥A∥∞ = maxi,j |Aij| and denote the singular values of A by σ1(A) σ2(A) · · · σmin{m,n}(A) 0. The notation ∥ · ∥2 refers to the Euclidean norm on vectors. Recall that the spectral norm, trace norm, and Frobenius norm of A are given by
∥A∥ = max ∥Ax∥2 = σ1(A), x∈Rn, ∥x∥2=1
∥A∥Σ = σi(A),
∥A∥F =A2ij =σi(A)2.
An essential property of these norms is their invariance under orthogonal transforma- tions on the left and on the right, which incidentally explains the alternative expres- sions for the spectral and Frobenius norms given above. The following relationship follows at once by the Cauchy-Schwarz inequality:
∥A∥Σ ∥A∥F rank(A) (A ∈ Rm×n). (2.1) For A,B ∈ Rm×n, we write ⟨A,B⟩ = i,j AijBij. A useful consequence of the
singular value decomposition is:
⟨A, B⟩ ∥A∥ ∥B∥Σ (A, B ∈ Rm×n). (2.2)
The Hadamard product of A and B is the matrix A ◦ B = [Aij Bij ]. Recall that
rank(A ◦ B) rank(A) rank(B). (2.3)
The symbol J stands for the all-ones matrix, whose dimensions will be apparent from the context. The notation A 0 means that all the entries in A are nonnegative. The shorthand A ̸= 0 means as usual that A is not the zero matrix.
2.2. Fourier transform over Zn2. Consider the vector space of functions {0, 1}n → R, equipped with the inner product
⟨f,g⟩ = 2−n f(x)g(x). x∈{0,1}n
8 A.A. RAZBOROV, A.A. SHERSTOV
For S ⊆ [n], define χS : {0,1}n → {−1,+1} by χS(x) = (−1)i∈S xi. Then {χS}S⊆[n] is an orthonormal basis for the inner product space in question. As a result, every function f : {0, 1}n → R has a unique representation of the form
f(x) = ˆˆ
ˆ f(S)χS(x),
S⊆[n]
where f(S) = ⟨f,χS⟩. The reals f(S) are called the Fourier coefficients of f. The
ˆ following fact is immediate from the definition of f(S):
Proposition 2.1. Let f : {0, 1}n → R be given. Then
ˆ −n max |f(S)| 2
|f(x)|.
S⊆[n]
x∈{0,1}n
2.3. Symmetric functions. Let Sn denote the symmetric group on n elements. For σ ∈ Sn and x ∈ {0,1}n, we denote by σx the string (xσ(1),…,xσ(n)) ∈ {0,1}n. A function φ : {0, 1}n → R is called symmetric if φ(x) = φ(σx) for every x ∈ {0, 1}n and every σ ∈ Sn. Equivalently, φ is symmetric if φ(x) is uniquely determined by |x|.
Observe that for every φ : {0, 1}n → R (symmetric or not), the derived function
φsym(x) = E φ(σx) σ∈Sn
is symmetric. Symmetric functions on {0,1}n are intimately related to univariate polynomials, as demonstrated by Minsky and Papert’s symmetrization argument:
Proposition 2.2 (Minsky & Papert [34]). Let φ : {0, 1}n → R be representable by a real n-variate polynomial of degree r. Then there is a polynomial p ∈ Pr with
E φ(σx) = p(|x|) ∀x ∈ {0, 1}n. σ∈Sn
We will need the following straightforward generalization.
Proposition 2.3. Let n1,…,nk be positive integers, n = n1 + ··· + nk. Let
φ : {0,1}n → R be representable by a real n-variate polynomial of degree r. Write x ∈ {0, 1}n as x = (x(1), . . . , x(k)), where x(i) = (xn1+···+ni−1+1, . . . , xn1+···+ni ). Then there is a polynomial p on Rk of degree at most r such that
E φ(σ1x(1), . . . , σkx(k)) = p|x(1)|, . . . , |x(k)| ∀x ∈ {0, 1}n. σ1 ∈Sn1 ,…,σk ∈Snk
2.4. Sign-rank. The sign-rank of a real matrix A = [Aij] is the least rank of a matrix B = [Bij] such that AijBij > 0 for all i,j with Aij ̸= 0. (Note that this definition generalizes the one given above in the abstract and introduction, which only applied to matrices A with nonzero entries.)
In general, the sign-rank of a matrix can be vastly smaller than its rank. For
example, consider the following nonsingular matrices of order n 3, representing
well-known problems greater-than and equality in communication complexity: 1 −1
1 1−1 1 1 −1
.., ... ..
−1 1 1 −1 1 −1
THE SIGN-RANK OF AC0 9
These matrices have sign-rank at most 2 and 3, respectively. Indeed, the first matrix has the same sign pattern as the matrix [2(j − i) + 1]i,j . The second has the same sign pattern as the matrix [(1−ε)−⟨vi,vj⟩]i,j, where v1,v2,…,vn ∈ R2 are arbitrary pairwise distinct unit vectors and ε is a suitably small positive real, cf. [40, §5].
Bounding the sign-rank from below is a considerable challenge. In a breakthrough result, Forster [13] proved the first nontrivial lower bound on the sign-rank of an explicit ±1 matrix. The centerpiece of Forster’s argument is the following theorem, which is a crucial starting point for our work.
Theorem 2.4 (Forster [13], implicit). Let X,Y be finite sets and M = [Mxy]x∈X,y∈Y a real matrix (M ̸= 0). Put r = sign-rank(M). Then there is a matrix R = [Rxy]x∈X,y∈Y such that:
rank(R) = r,
M ◦ R 0, ∥R∥∞ 1, ∥R∥F =|X||Y|/r.
Appendix B provides a detailed explanation of how Theorem 2.4 is implicit in Forster’s work.
2.5. Pattern matrices. Pattern matrices were introduced in [49, 50] and proved useful in obtaining strong lower bounds on communication complexity. Rele- vant definitions and results from [50] follow.
Let n and N be positive integers with n | N. Split [N] into n contiguous blocks, with N/n elements each:
N N 2N (n−1)N [N]= 1,2,…, n ∪ n +1,…, n ∪···∪ n +1,…,N .
Let V(N,n) denote the family of subsets V ⊆ [N] that have exactly one element from each of these blocks (in particular, |V | = n). Clearly, |V(N, n)| = (N/n)n. For a bit string x ∈ {0, 1}N and a set V ∈ V(N, n), define the projection of x onto V by
x|V = (xi1,xi2,…,xin) ∈ {0,1}n,
wherei1
xk−xidkd k=0 i=0 k=0
i̸=k
We now establish another auxiliary fact. It provides a convenient means to bound a function whose Fourier transform is supported on low-order characters, in terms of its behavior on low-weight inputs.
Lemma3.2. Letkbeaninteger,0kn−1.Letf:{0,1}n →Rbegiven ˆ
n kn
|f(1 )| 2 max|f(x)|.
k |x|k
Proof. Define the symmetric function g : {0,1}n → R by g(x) = χ[n](x)p(|x|),
where
The following properties of g are immediate: g(1n) = (−1)n,
g(x) = 0
The degree of every monomial in g is between k + 1 and n, so that
p(t)= t−i.
Furthermore,
gˆ(S) = 0
(|S| k).
k n n − t − 1 n k
k n |g(x)|= t |p(t)|=
n − k k n−t t
t n−k−1 = k We are now prepared to analyze f. By (3.3),
x∈{0,1}n
On the other hand, (3.1) and (3.2) show that
t=0
n−i k k. We will not need this observation, however.
We are now in a position to study the approximation problem of interest to us.
Define the sets
Z = {0,1,2,…,4m2}m, Z+ = {1,2,…,4m2}m.
Looking ahead, Z+ and Z \ Z+ correspond to the sets on which the Minsky-Papert
m 4m2
function i=1 j=1 xij is true and false, respectively. Accordingly, we define F : Z →
{−1, +1} by
−1 ifx∈Z+, 1 otherwise.
Foru,z∈Z,let∆(u,z)=|{i:ui ̸=zi}|betheordinaryHammingdistance. Weshall prove the following intermediate result, inspired by Minsky and Papert’s analysis [34] of the threshold degree of CNF formulas.
Lemma 3.4. Let Q be a degree-d real polynomial in m variables, where d m/3. Assume that
F (z)Q(z) −1 (z ∈ Z). (3.7)
Then |Q(z)| 4m+d at every point z ∈ Z+ with ∆(u,z) < m/3, where u = (12, 32, 52, . . . , (2m − 1)2) ∈ Z+.
F(z) =
Proof. Fix z ∈ Z+ with ∆(u,z) < m/3. Define p ∈ P2d by p(t) = Q(p1(t), p2(t), . . . , pm(t)),
where
(t − 2i + 1)2 if zi = ui (equivalently, zi = (2i − 1)2), pi(t) = zi otherwise.
Letting S = {i : ui = zi}, inequality (3.7) implies that p(2i − 1) −1 (i ∈ S),
p(2i) 1 (i = 0,1,...,m).
Claim3.5. Leti∈S.Then|p(ξ)|1forsomeξ∈[2i−2,2i−1].
(3.8) (3.9)
Proof. The claim is trivial if p vanishes at some point in [2i − 2, 2i − 1]. In the contrary case, p maintains the same sign throughout this interval. As a result, (3.8) and (3.9) show that min{|p(2i − 2)|, |p(2i − 1)|} 1.
Claim 3.5 provides |S| > 2m/3 2d deg(p) points in [0,2m], with pairwise distances at least 1, at which p is bounded in absolute value by 1. By Lemma 3.1,
max |p(t)|2 0t2m
deg(p) 2m m+d
deg(p)
.
This completes the proof since Q(z) = p(0).
Finally, we remove the restriction on ∆(u, z), thereby establishing the main result
of this section.
4
THE SIGN-RANK OF AC0 13 Theorem 3.6. Let Q be a degree-d real polynomial in m variables, where
d < m/3. Assume that
F(z)Q(z) −1 (z ∈ Z).
Then
|Q(z)| 16m (z ∈ Z+).
Proof. As before, put u = (12,32,52,...,(2m − 1)2). Fix z ∈ Z+ and define the
“interpolating” function f : {0, 1}m → R by
f(x)=Q(x1z1 +(1−x1)u1,...,xmzm +(1−xm)um).
In this notation, we know from Lemma 3.4 that |f (x)| 4m+d for every x ∈ {0, 1}m with |x| < m/3, and our goal is to show that |f(1m)| 16m. Since Q has degree d, the Fourier transform of f is supported on characters of order up to d. As a result,
|f(1
)| 2 2
16m.
m
dm
d |x|d
max |f(x)| 2m+3d m
by Lemma 3.2 by Lemma 3.4
d
4. A smooth orthogonalizing distribution. An important concept in our work is that of an orthogonalizing distribution [51]. Let f : {0,1}n → {−1,+1} be given. A distribution μ on {0, 1}n is d-orthogonalizing for f if
E f(x)χS(x) = 0 (|S| < d). x∼μ
In words, a distribution μ is d-orthogonalizing for f if with respect to μ, the function f is orthogonal to every character of order less than d.
This section focuses on the following function from {0, 1}4m3 to {−1, +1}:
m 4m2 MPm(x) = xi,j.
i=1 j=1
(Recall that we interpret −1 as “true”.) This function was originally studied by Min- sky and Papert [34] and has played an important role in later works [30, 38, 49, 50]. An explicit m-orthogonalizing distribution for MPm is known [49]. However, our main result requires a Θ(m)-orthogonalizing distribution for MPm that is addition- ally smooth, i.e., places substantial weight on all but a tiny fraction of the points, and the distribution given in [49] severely violates the latter property. Proving the existence of a distribution that is simultaneously Θ(m)-orthogonalizing and smooth is the goal of this section (Theorem 4.1).
We will view an input x ∈ {0, 1}n = {0, 1}4m3 to MPm as composed of blocks: x = (x(1),...,x(m)), where the ith block is x(i) = (xi,1,xi,2,...,xi,4m2). The proof
14 A.A. RAZBOROV, A.A. SHERSTOV
that is about to start refers to the sets Z,Z+ and the function F as defined in Section 3.
Theorem 4.1. There is a 1 m-orthogonalizing distribution μ for MPm such that 3
μ(x) 1 16−m 2−n for all inputs x ∈ {0, 1}n with MPm(x) = −1. 2
Proof. Let X be the set of all inputs with MPm(x) = −1, i.e., X ={x∈{0,1}n :x(1) ̸=0, ..., x(m) ̸=0}.
It suffices to show that the following linear program has optimum at least 116−m: 2
(LP1)
The optimum being nonzero, it will follow by a scaling argument that any optimal solution has μ(x) = 1. As a result, μ will be the sought probability distribution.
For x ∈ {0, 1}n, we let z(x) = (|x(1)|, . . . , |x(m)|); note that MPm(x) = F (z(x)). Since the function MPm is invariant under the action of the group S4m2 × · · · × S4m2 , in view of Proposition 2.3, the dual of (LP1) can be simplified as follows:
(LP2)
The programs are both feasible and therefore have the same finite optimum. Fix
an optimal solution η,Q,δz to (LP2). For the sake of contradiction, assume that
variables: maximize:
subject to:
μ(x) 0 for x ∈ {0,1}n μ(x)MPm(x)χS(x) = 0
μ(x) 1, x∈{0,1}n
μ(x) ε2−n
ε 0; ε
x∈{0,1}n
for |S| < m/3,
for x ∈ X.
variables: a polynomial Q on Rm of degree < m/3; η0; δz 0forz∈Z+
minimize: η
subject to: δz(x) 2n, x∈X
F(z)Q(z) −η F(z)Q(z) −η + δz
for z ∈ Z, for z ∈ Z+.
η 1 16−m. Then |Q(z)| 1 for each z ∈ Z+, by Theorem 3.6. From the constraints 22
of the third type in (LP2) we conclude that δz 1 +η < 1 (z ∈ Z+). This contradicts 2
the first constraint. Thus, the optimum of (LP1) and (LP2) is at least 1 16−m . 2
5. A generalization of Forster’s bound. Using Theorem 2.4, Forster gave a simple proof of the following fundamental result [13, Thm. 2.2]: for any matrix A = [Axy ]x∈X, y∈Y with ±1 entries,
|X||Y| sign-rank(A) ∥A∥ .
Forster et al. [14, Thm. 3] generalized this bound to arbitrary real matrices A ̸= 0:
|X||Y|
sign-rank(A) · min |Axy |. (5.1)
∥A∥ x,y
THE SIGN-RANK OF AC0 15
Forster and Simon [15, §5] considered a different generalization, inspired by the notion of matrix rigidity (see, e.g., [43]). Let A be a given ±1 matrix, and let A ̃ be obtained from A by changing some h entries in an arbitrary fashion (h < |X | |Y |). Forster and Simon showed that
|X||Y|
sign-rank(A ̃) √ . (5.2)
∥A∥+2 h
The above generalizations are not sufficient for our purposes. Before we can proceed, we need to prove the following “hybrid” bound, which combines the ideas of (5.1) and (5.2).
Theorem 5.1. Let A = [Axy]x∈X,y∈Y be a real matrix with s = |X||Y| entries (A ̸= 0). Assume that all but h of the entries of A satisfy |Axy| γ, where h and γ > 0 are arbitrary parameters. Then
γs sign-rank(A) ∥A∥√s + γh.
Proof. Let r denote the sign-rank of A. Theorem 2.4 supplies a matrix R = [Rxy] with
On the other hand,
x,y
γ∥R∥2F − γh = γs − γh
r
rank(R) = r, A ◦ R 0, ∥R∥∞ 1, ∥R∥F = s/r.
(5.3) (5.4) (5.5) (5.6)
The crux of the proof is to estimate ⟨A, R⟩ from below and above. On the one hand,
⟨A,R⟩ AxyRxy x,y: |Axy |γ
γ |Rxy|−h
by (5.4)
by(5.4),(5.5)
by (5.5) by (5.6).
⟨A, R⟩ ∥A∥ · ∥R∥Σ ∥A∥ · ∥R∥F
by (2.2)
by (2.1), (5.3) by (5.6).
√ √
r
= ∥A∥ s
Comparing these lower and upper bounds on ⟨A,R⟩ yields the claimed estimate of
r = sign-rank(A).
6. Main result. At last, we are in a position to prove our main result. m m2
Theorem 1.1 (Restated from p. 2). Define fm(x, y) = i=1 j=1(xij Then the matrix [fm(x, y)]x,y has sign-rank 2Ω(m).
∧ yij ).
16 A.A. RAZBOROV, A.A. SHERSTOV
Proof. Let M be the (N, n, MPm)-pattern matrix, where n = 4m3 and N = 176n. Let P be the (N, n, μ)-pattern matrix, where μ is the distribution from Theorem 4.1. We are going to estimate the sign-rank of M ◦ P.
By Theorem 4.1, all but a 2−Ω(m2) fraction of the inputs x ∈ {0,1}n satisfy μ(x) 1 16−m 2−n. As a result, all but a 2−Ω(m2) fraction of the entries of M ◦ P are
2
at least 1 16−m 2−n in absolute value. Theorem 5.1 at once implies that
2
16−m 2−n√s Ω(m2) sign-rank(M)sign-rank(M◦P)min 4∥M◦P∥ , 2
where s = 2N+n N n denotes the number of entries in M ◦ P. n
, (6.1)
We now bound the spectral norm of M ◦ P precisely as in [51, §6]. Note first
that M ◦P is the (N,n,φ)-pattern matrix, where φ : {0,1}n → R is given by φ(x) =
MPm(x)μ(x). Since μ is a 1m-orthogonalizing distribution for MPm, we have 3
φˆ(S) = 0 for |S| < 1 m. 3
Since x∈{0,1}n |φ(x)| = 1, Proposition 2.1 shows that
|φˆ(S)| 2−n for each S ⊆ [n].
Theorem 2.6 implies, in view of (6.2) and (6.3), that
√ N−m/6 √
(6.2)
(6.3)
∥M◦P∥ s·2−n
n
=17−m2−n s.
Along with (6.1), this estimate shows that M has sign-rank at least 2Ω(m).
It remains to verify that M is a submatrix of [fcm(x,y)]x,y, where c = ⌈8N/n⌉ = Θ(1). The set V ∈ V(N,n) in Definition 2.5 is naturally represented by a tuple
(...,vij,...)∈[N/n]n.Thenforallx∈{0,1}N and(V,w)∈V(N,n)×{0,1}n,
m 4m2 m 4m2 N/n
(x|V ⊕w)ij = ((xijk =ε)∧(wij ̸=ε)∧(vij =k)). i=1 j=1 i=1 j=1 k=1 ε∈{0,1}
Merging the three groups of OR gates gives bottom fan-in 8m2N/n (cm)2. Remark 6.1. The lower bound in Theorem 1.1 is essentially optimal. To see
this, note that the matrix [fm(x,y)]x,y has the same sign pattern as mm2
2 R= 1− xijyij .
i=1 j=1
By property (2.3) of the Hadamard product, the sign-rank of [fm(x,y)]x,y does not
exceed m2m + 1 = 2O(m log m).
7. On communication complexity classes. We proceed to explore the con- sequences of our main result in the study of communication complexity classes. Throughout this section, the symbol {fn} shall stand for a family of functions f1,f2,...,fn,...,wherefn :{0,1}n×{0,1}n →{−1,+1}.
The first class that we consider, UPPcc, corresponds to functions with efficient unbounded-error protocols.
Definition 7.1 (Babai et al. [3, §4]). A family {fn} is in UPPcc iff for some constant c and all natural n > c, there exists a probabilistic communication protocol with private coins such that:
x,y
THE SIGN-RANK OF AC0 17
(1) for every input (x,y), the protocol outputs the correct value fn(x,y) with probability greater than 1/2;
(2) the number of bits exchanged is at most logc n.
Note that in this model the number of coin flips is not included in the complexity measure. Requiring the same bound of logc n on the number of coin flips results in another extensively studied class [3], called PPcc.
For our purposes, however, an equivalent matrix-analytic characterization [40] is more convenient. By the sign-rank of a function f : X × Y → {−1, +1}, where X, Y are finite sets, we shall mean the sign-rank of the matrix [f (x, y)]x∈X, y∈Y .
Theorem 7.2 (Paturi & Simon [40]). A function family {fn} is in the class UPPcc iff for some constant c > 1 and all n > c, the function fn has sign-rank at most 2logc n.
We now turn to the communication-complexity analogue of the polynomial hier- archy, also defined by Babai et al. [3]. A function fn : {0, 1}n × {0, 1}n → {−1, +1} is called a rectangle if there exist subsets A, B ⊆ {0, 1}n such that
fn(x,y)=−1 ⇔ x∈A, y∈B.
We call fn the complement of a rectangle if the negated function ¬fn = −fn is a rectangle.
Definition 7.3 (Babai et al. [3, §4]).
(1) Afamily{f }isinΠcc iffeachf isarectangle. Afamily{f }isinΣcc iff
n0n n0 {¬f } is in Πcc.
n0
(2) Fix an integer k = 1,2,3,4,…. A family {fn} is in Σcc iff for some constant
c>1 and all n>c,
f = ··· gi1,i2,…,ik,
2logc n 2logc n 2logc n 2logc n nn
i1=1 i2=1 i3=1 ik=1
where = (resp., = ) for k odd (resp., even); and each gi1,i2,…,ik is
a rectangle (resp., the complement of a rectangle) for k odd (resp., even). A family {fn} is in Πcc iff {¬fn} is in Σcc.
kk
(3) The polynomial hierarchy is given by PHcc = Σcc = Πcc, where k = kk kk
0, 1, 2, 3, . . . ranges over all constants.
(4) A family {fn} is in PSPACEcc iff for some constant c > 1 and all n > c,
2logc n 2logc n 2logc n 2logc n
f = ··· gi1,i2,…,ik,
nn
i1=1 i2=1 i3=1 ik=1
where k < logc n is odd and each gi1,i2,...,ik is a rectangle. n
Thus, the zeroth level (Σcc and Πcc) of the polynomial hierarchy consists of 00
rectangles and complements of rectangles, the simplest functions in communication
complexity. The first level is easily seen to correspond to functions with efficient
nondeterministic or co-nondeterministic protocols: Σcc = NPcc and Πcc = coNPcc. 11
This level is well understood, and there exist powerful methods to show that {fn} ̸∈ Σcc for a host of explicit functions {f }. Finding an explicit sequence {f } ̸∈ Σcc, on
1nn2 the other hand, is a longstanding open problem.
The circuit class AC0 is related to the polynomial hierarchy PHcc in communica- tion complexity in the obvious way. Specifically, if fn : {0, 1}n × {0, 1}n → {−1, +1},
k
n
18 A.A. RAZBOROV, A.A. SHERSTOV
n = 1,2,3,4,..., is an AC0 (or even quasi-AC0) circuit family of depth k with an
or gate at the top (resp., and gate), then {fn} ∈ Σcc (resp., {fn} ∈ Πcc ). In k−1 k−1
particular, the depth-3 circuit family {f } in Theorem 1.1 is in Πcc, whereas {¬f } n2n
is in Σcc. In this light, Theorem 1.1 establishes the following separations: 2
Theorem 7.4. Σcc ̸⊆ UPPcc, Πcc ̸⊆ UPPcc. 22
Several years prior to our work, Forster [13] proved that the familiar inner prod-
uct function ipn(x,y) = (−1)xiyi has sign-rank 2Θ(n). Since {ipn} ∈ PSPACEcc,
Forster’s result yields the separation PSPACEcc ̸⊆ UPPcc. Theorem 7.4 in this pa-
per substantially strengthens it, showing that even the second level (Σcc,Πcc) of the 22
polynomial hierarchy is not contained in UPPcc. This settles the open problem due to Babai et al. [3, p. 345], who asked whether Σcc ⊆ UPPcc. Observe that Theorem 7.4
is best possible in that UPPcc trivially contains Σcc, Σcc, Πcc, and Πcc. 010 1
The classes PPcc and UPPcc both correspond to small-bias communication and, in
fact, were both inspired by the class PP in computational complexity. It is well-known
and straightforward to show that PPcc ⊆ UPPcc. It turns out that UPPcc is strictly
more powerful than PPcc, as shown by Buhrman et al. [8] and independently in [48].
In this light, Theorem 7.4 in this paper substantially strengthens earlier separations
of Σcc and Πcc from PPcc, obtained independently in [8] and [49]: 22
Theorem 7.5 (Buhrman et al. [8], Sherstov [49]). Σcc ̸⊆ PPcc, Πcc ̸⊆ PPcc. 22
Sign-rank is a much more challenging quantity to analyze than discrepancy, a combinatorial complexity measure that is known [25] to characterize membership in PPcc. Indeed, exponentially small upper bounds on the discrepancy were known more than twenty years ago [56, 10], whereas the first exponential lower bound on the sign- rank for an explicit function was only obtained recently by Forster [13]. It is not surprising, then, that this paper has required ideas that are quite different from the methods of both [8] and [48, 49].
8. Open problems. Our work is closely related to several natural and impor- tant problems. The first is a well-known and challenging open problem in complexity theory. Are there matrices computable in AC0 that have low spectral norm? More pre- cisely, does one have ∥[f(x, y)]x∈X,y∈Y ∥ 2−nΩ(1) |X| |Y | for some choice of an AC0 function f : {0, 1}n × {0, 1}n → {−1, +1} and some multisets X, Y of n-bit Boolean strings? An affirmative answer to this question would subsume our results and ad- ditionally imply that AC0 is not learnable in Kearns’ statistical query model [21]. A suitable lower bound on the spectral norm of every such matrix, on the other hand, would result in a breakthrough separation of PHcc and PSPACEcc. See [3, 43, 33, 48] for relevant background.
The second problem concerns the sign-rank of arbitrary pattern matrices. For a Boolean function f : {0,1}n → {−1,+1}, its threshold degree deg±(f) is the least degree of a multivariate polynomial p(x1,...,xn) such that f(x) ≡ sgnp(x). Let Mf denote the (nc , n, f )-pattern matrix, where c 1 is a sufficiently large constant. It is straightforward to verify that the sign-rank of Mf does not exceed nO(deg±(f)). Is that upper bound close to optimal? Specifically, does Mf have sign-rank exp(deg±(f)Ω(1)) for every f? Evidence in this paper and prior work suggests an answer in the affir- mative. For example, our main result confirms this hypothesis for the Minsky-Papert function, f = MP. For f = parity the hypothesis follows immediately the seminal work of Forster [13]. More generally, it was proven in [51] that the hypothesis holds for all symmetric functions.
In the field of communication complexity, we were able to resolve the main ques- tion left open by Babai et al. [3], but only in one direction: PHcc ̸⊆ UPPcc. As we
2
THE SIGN-RANK OF AC0 19 already noted, the other direction remains wide open despite much research: no lower
bounds are known for PHcc or even Σcc. The latter question is in turn equivalent 2
to lower bounds for bounded-depth circuits in the context of graph complexity (e.g., see [41, 42, 19] and the literature cited therein). It is also known [43, Thm. 1] that sufficiently rigid matrices do not belong to PHcc, which provides further motivation to be looking for lower bounds on matrix rigidity.
Acknowledgments. The authors would like to thank Scott Aaronson, Adam Klivans, Dieter van Melkebeek, Yaoyun Shi, and Ronald de Wolf for helpful feedback on an earlier version of this manuscript. We are also grateful to our anonymous reviewers for their useful comments.
REFERENCES
[1] Michael Alekhnovich, Mark Braverman, Vitaly Feldman, Adam R. Klivans, and Toni- ann Pitassi, Learnability and automatizability, in Proc. of the 45th Symposium on Foun- dations of Computer Science (FOCS), 2004, pp. 621–630.
[2] Noga Alon, Peter Frankl, and Vojtech Ro ̈dl, Geometrical realization of set systems and probabilistic communication complexity, in Proc. of the 26th Symposium on Foundations of Computer Science (FOCS), 1985, pp. 277–280.
[3] La ́szlo ́ Babai, Peter Frankl, and Janos Simon, Complexity classes in communication complexity theory, in Proc. of the 27th Symposium on Foundations of Computer Science (FOCS), 1986, pp. 337–347.
[4] Shai Ben-David, Nadav Eiron, and Hans Ulrich Simon, Limitations of learning via embed- dings in Euclidean half spaces, J. Mach. Learn. Res., 3 (2003), pp. 441–461.
[5] Avrim Blum, Alan M. Frieze, Ravi Kannan, and Santosh Vempala, A polynomial-time algorithm for learning noisy linear threshold functions, Algorithmica, 22 (1998), pp. 35–52. [6] Avrim Blum, Adam Kalai, and Hal Wasserman, Noise-tolerant learning, the parity problem,
and the statistical query model, J. ACM, 50 (2003), pp. 506–519.
[7] Nader H. Bshouty, A subexponential exact learning algorithm for DNF using equivalence
queries, Inf. Process. Lett., 59 (1996), pp. 37–39.
[8] Harry Buhrman, Nikolai K. Vereshchagin, and Ronald de Wolf, On computation and
communication with small bias, in Proc. of the 22nd Conf. on Computational Complexity
(CCC), 2007, pp. 24–32.
[9] E.W.Cheney,IntroductiontoApproximationTheory,ChelseaPublishing,NewYork,2nded.,
1982.
[10] Benny Chor and Oded Goldreich, Unbiased bits from sources of weak randomness and
probabilistic communication complexity, SIAM J. Comput., 17 (1988), pp. 230–261.
[11] Ronald de Wolf, Quantum Computing and Communication Complexity, PhD thesis, Uni-
versity of Amsterdam, 2001.
[12] Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami,
New results for learning noisy parities and halfspaces, in Proceedings of the 47th Annual
Symposium on Foundations of Computer Science (FOCS), 2006, pp. 563–574.
[13] Ju ̈rgen Forster, A linear lower bound on the unbounded error probabilistic communication
complexity, J. Comput. Syst. Sci., 65 (2002), pp. 612–625.
[14] Ju ̈rgen Forster, Matthias Krause, Satyanarayana V. Lokam, Rustam Mubarakzjanov,
Niels Schmitt, and Hans-Ulrich Simon, Relations between communication complex- ity, linear arrangements, and computational complexity, in Proc. of the 21st Conf. on Foundations of Software Technology and Theoretical Computer Science (FST TCS), 2001, pp. 171–182.
[15] Ju ̈rgen Forster and Hans Ulrich Simon, On the smallest possible dimension and the largest possible margin of linear arrangements representing given concept classes, Theor. Comput. Sci., 350 (2006), pp. 40–48.
[16] Mikael Goldmann, Johan H ̊astad, and Alexander A. Razborov, Majority gates vs. gen- eral weighted threshold gates, Computational Complexity, 2 (1992), pp. 277–300.
[17] Andra ́s Hajnal, Wolfgang Maass, Pavel Pudla ́k, Mario Szegedy, and Gyo ̈rgy Tura ́n,
Threshold circuits of bounded depth, J. Comput. Syst. Sci., 46 (1993), pp. 129–154.
[18] David P. Helmbold, Robert H. Sloan, and Manfred K. Warmuth, Learning integer lat-
tices, SIAM J. Comput., 21 (1992), pp. 240–266.
20 A.A. RAZBOROV, A.A. SHERSTOV
[19] Stasys Jukna, On graph complexity, Combinatorics, Probability and Computing, 15 (2006), pp. 1–22.
[20] BalaKalyanasundaramandGeorgSchnitger,Theprobabilisticcommunicationcomplexity of set intersection, SIAM J. Discrete Math., 5 (1992), pp. 545–557.
[21] Michael Kearns, Efficient noise-tolerant learning from statistical queries, in Proc. of the 25th Symposium on Theory of Computing (STOC), 1993, pp. 392–401.
[22] Michael Kearns and Leslie Valiant, Cryptographic limitations on learning Boolean formu- lae and finite automata, J. ACM, 41 (1994), pp. 67–95.
[23] Michael J. Kearns and Umesh V. Vazirani, An Introduction to Computational Learning Theory, MIT Press, Cambridge, 1994.
[24] Michael Kharitonov, Cryptographic hardness of distribution-specific learning, in Proc. of the 25th Symposium on Theory of Computing, 1993, pp. 372–381.
[25] Hartmut Klauck, Lower bounds for quantum communication complexity, SIAM J. Comput., 37 (2007), pp. 20–46.
[26] Adam R. Klivans and Rocco A. Servedio, Learning DNF in time 2O ̃(n1/3), J. Comput. Syst. Sci., 68 (2004), pp. 303–318.
[27] Adam R. Klivans and Alexander A. Sherstov, A lower bound for agnostically learning disjunctions, in Proc. of the 20th Conf. on Learning Theory (COLT), 2007, pp. 409–423.
[28] , Unconditional lower bounds for learning intersections of halfspaces, Machine Learning, 69 (2007), pp. 97–114.
[29] , Cryptographic hardness for learning intersections of halfspaces, J. Comput. Syst. Sci., 75 (2009), pp. 2–12.
[30] Matthias Krause and Pavel Pudla ́k, On the computational power of depth-2 circuits with threshold and modulo gates, Theor. Comput. Sci., 174 (1997), pp. 137–156.
[31] EyalKushilevitzandNoamNisan,Communicationcomplexity,CambridgeUniversityPress, New York, 1997.
[32] Nati Linial, Shahar Mendelson, Gideon Schechtman, and Adi Shraibman, Complexity measures of sign matrices, Combinatorica, 27 (2007), pp. 439–463.
[33] Satyanarayana V. Lokam, Spectral methods for matrix rigidity with applications to size-depth trade-offs and communication complexity, J. Comput. Syst. Sci., 63 (2001), pp. 449–473.
[34] Marvin L. Minsky and Seymour A. Papert, Perceptrons: Expanded edition, MIT Press, Cambridge, Mass., 1988.
[35] Ilan Newman, Private vs. common random bits in communication complexity, Inf. Process. Lett., 39 (1991), pp. 67–71.
[36] Noam Nisan, The communication complexity of threshold gates, in Combinatorics, Paul Erdo ̋s is Eighty, 1993, pp. 301–315.
[37] A. B. J. Novikoff, On convergence proofs on perceptrons, in Proc. of the Symposium on the Mathematical Theory of Automata, vol. XII, 1962, pp. 615–622.
[38] Ryan O’Donnell and Rocco A. Servedio, New degree bounds for polynomial threshold func- tions, in Proc. of the 35th Symposium on Theory of Computing (STOC), 2003, pp. 325–334.
[39] Ramamohan Paturi, On the degree of polynomials that approximate symmetric Boolean func-
tions, in Proc. of the 24th Symposium on Theory of Computing, 1992, pp. 468–474.
[40] Ramamohan Paturi and Janos Simon, Probabilistic communication complexity, J. Comput.
Syst. Sci., 33 (1986), pp. 106–123.
[41] Pavel Pudla ́k, Vojtech Ro ̈dl, and Petr Savicky ́, Graph complexity, Acta Inf., 25 (1988),
pp. 515–535.
[42] Alexander A. Razborov, Bounded-depth formulae over the basis {&,⊕} and some combi-
natorial problems, Complexity Theory and Applied Mathematical Logic, vol. “Problems of Cybernetics” (1988), pp. 146–166. In Russian, available at http://www.mi.ras.ru/ ~razborov/graph.pdf.
[43] , On rigid matrices. Manuscript in Russian, available at http://www.mi.ras.ru/ ~razborov/rigid.pdf, June 1989.
[44] , On the distributional complexity of disjointness, Theor. Comput. Sci., 106 (1992), pp. 385–390.
[45] Theodore J. Rivlin, An Introduction to the Approximation of Functions, Dover Publications, New York, 1981.
[46] Frank Rosenblatt, The perceptron: A probabilistic model for information storage and orga- nization in the brain, Psychological Review, 65 (1958), pp. 386–408.
[47] Alexander A. Sherstov, Powering requires threshold depth 3, Inf. Process. Lett., 102 (2007), pp. 104–107.
[48] , Halfspace matrices, Comput. Complex., 17 (2008), pp. 149–178. Preliminary version in 22nd CCC, 2007.
[49]
[50]
[51]
[52]
[53]
[54]
[55] [56]
THE SIGN-RANK OF AC0 21
, Separating AC0 from depth-2 majority circuits, SIAM J. Comput., 38 (2009), pp. 2113– 2129. Preliminary version in 39th STOC, 2007.
, The pattern matrix method for lower bounds on quantum communication, in Proc. of the 40th Symposium on Theory of Computing (STOC), 2008, pp. 85–94.
, The unbounded-error communication complexity of symmetric functions, in Proc. of the 49th Symposium on Foundations of Computer Science (FOCS), 2008, pp. 384–393.
, Communication lower bounds using dual polynomials, Bulletin of the EATCS, 95 (2008), pp. 59–93.
Nathan Srebro and Adi Shraibman, Rank, trace-norm and max-norm., in Proc. of the 18th Conf. on Learning Theory (COLT), 2005, pp. 545–560.
Jun Tarui and Tatsuie Tsukiji, Learning DNF by approximating inclusion-exclusion formu- lae, in Proc. of the 14th Conf. on Computational Complexity (CCC), 1999, pp. 215–221.
Leslie G. Valiant, A theory of the learnable, Commun. ACM, 27 (1984), pp. 1134–1142. Umesh V. Vazirani, Strong communication complexity or generating quasirandom sequences
form two communicating semi-random sources, Combinatorica, 7 (1987), pp. 375–392.
Appendix A. More on the unbounded-error model. Readers with back- ground in communication complexity will note that the unbounded-error model is exactly the same as the private-coin randomized model [31, Chap. 3], with one ex- ception: in the latter case the correct answer is expected with probability at least 2/3, whereas in the former case the success probability need only exceed 1/2 (say, by an exponentially small amount). This difference has far-reaching implications. For example, the fact that the parties in the unbounded-error model do not have a shared source of random bits is crucial: allowing shared randomness would make the com- plexity of every function a constant, as one can easily verify. By contrast, introducing shared randomness into the randomized model has minimal impact on the complexity of any given function [35].
As one might expect, the weaker success criterion in the unbounded-error model
also has a drastic impact on the complexity of certain functions. For example, the
well-known disjointness function on n-bit strings has complexity O(logn) in the
unbounded-error model and Ω(n) in the randomized model [20, 44]. Furthermore,
explicit functions are known [8, 48] with unbounded-error complexity O(log n) that √
require Ω( n) communication in the randomized model to even achieve advantage
√
2−
{−1, +1} is never much more than its complexity in the other standard models. For
n/5 over random guessing.
More generally, the unbounded-error complexity of a function f : X × Y →
example, it is not hard to see that
U(f) min{N0(f), N1(f)} + O(1)
D(f) + O(1),
where D, N0, and N1 refer to communication complexity in the deterministic, 0-
nondeterministic, and 1-nondeterministic models, respectively. Continuing, U(f) R1/3(f) + O(1)
O Rpub (f ) + log log [|X | + |Y |] , 1/3
where R1/3 and Rpub refer to the private- and public-coin randomized models, respec- 1/3
tively. As a matter of fact, one can show that
U (f ) O Q∗1/3 (f ) + log log [|X | + |Y |] ,
where Q∗1/3 refers to the quantum model with prior entanglement. An identi- cal inequality is clearly valid for the quantum model without prior entanglement.
22 A.A. RAZBOROV, A.A. SHERSTOV
See [31, 11] for rigorous definitions of these various models; our sole intention was to
point out that the unbounded-error model is at least as powerful.
Appendix B. Details on Forster’s method. The purpose of this section is to explain in detail how Theorem 2.4 is implicit in Forster’s work. Recall that vectors v1, . . . , vn in Rr are said to be in general position if no r of them are linearly dependent. In a powerful result, Forster proved that any set of vectors in general position can be balanced in a useful way:
Theorem B.1 (Forster [13, Thm. 4.1]). Let U ⊂ Rr be a finite set of vectors in general position, |U| r. Then there is a nonsingular transformation A ∈ Rr×r such that
1 (Au)(Au)T = |U|Ir. u∈U ∥Au∥2 r
The proof of this result is rather elaborate and uses compactness arguments in an essential way.
The vector norm ∥ · ∥ above and throughout this section is the Euclidean norm ∥ · ∥2. The development that follows is closely analogous to Forster’s derivation [13, p. 617].
Theorem 2.4 (Restated from p. 9). Let X,Y be finite sets, M = [Mxy]x∈X,y∈Y a real matrix (M ̸= 0). Put r = sign-rank(M). Then there is a matrix R = [Rxy]x∈X,y∈Y such that:
rank(R) = r,
M ◦ R 0, ∥R∥∞ 1, ∥R∥F =|X||Y|/r.
(B.1) (B.2) (B.3) (B.4)
Proof. Since M ̸= 0, it follows that r 1. Fix a matrix Q = [Qxy] of rank r such that
Write
QxyMxy > 0 whenever Mxy ̸= 0. (B.5)
Q = ⟨ux,vy⟩
x∈X, y∈Y
for suitable collections of vectors {ux } ⊂ Rr and {vy } ⊂ Rr . If the vectors ux , vy are not already in general position, we can replace them with their slightly perturbed versions u ̃x, v ̃y that are in general position. Provided that the perturbations are small enough, property (B.5) will still hold, i.e., we will have ⟨u ̃x,v ̃y⟩Mxy > 0 whenever Mxy ̸= 0. As a result, we can assume w.l.o.g. that {ux},{vy} are in general position and in particular that all {vy} are nonzero.
Since sign-rank(M) rank(M), we infer that |X| r. Theorem B.1 is therefore applicable to the set {ux} and yields a nonsingular matrix A with
1 (Aux)(Aux)T = |X|Ir. (B.6) x∈X ∥Aux∥2 r
Define
THE SIGN-RANK OF AC0 23
⟨ux,vy⟩
R = ∥Aux∥ ∥(A−1)Tvy∥
.
It remains to verify properties (B.1)–(B.4). Property (B.1) follows from the repre- sentation R = D1QD2, where D1 and D2 are diagonal matrices with strictly positive diagonal entries. By (B.5), we know that RxyMxy > 0 whenever Mxy ̸= 0, which immediately gives us (B.2). Property (B.3) holds because
|⟨ux,vy⟩| = |⟨Aux,(A−1)Tvy⟩| 1. ∥Aux∥ ∥(A−1)Tvy∥ ∥Aux∥ ∥(A−1)Tvy∥
Finally, property (B.4) will follow once we show that x Rx2y = |X|/r for every y ∈ Y. So, fix y ∈ Y and consider the unit vector v = (A−1)Tvy/∥(A−1)Tvy∥. We have:
R2 = ⟨ux,vy⟩2
x∈X, y∈Y
xy
where the last step follows from (B.6).
x∈X
∥Aux∥2 ∥(A−1)Tvy∥2
= (vyTA−1)(Aux)(Aux)T(A−1)Tvy
x∈X ∥Aux∥2 ∥(A−1)Tvy∥2 1
= vT (Aux)(Aux)T v x∈X ∥Aux∥2
= |X|, r
x∈X