CS代考计算机代写 AI algorithm The Unbounded-Error Communication Complexity of Symmetric Functions∗

The Unbounded-Error Communication Complexity of Symmetric Functions∗
ALEXANDER A. SHERSTOV†
Abstract
We prove an essentially tight lower bound on the unbounded-error communication complexity of every symmetric function, i.e., f (x, y) = D(|x ∧ y|), where D: {0,1,…,n} → {0,1} is a given predicate and x, y range over {0, 1}n . Specifically, we show that the communication complexity of f is between 􏱍(k/ log5 n) and 􏱍(k log n), where k is the number of value changes of D in {0, 1, . . . , n}. Prior to this work, the problem was solved only for the parity predicate D (Forster 2001).
Our proof is built around two new ideas. First, we show that a predicate D gives rise to a rapidly mixing random walk on Zn2, which allows us to reduce the problem to communication lower bounds for “typical” predi- cates. Second, we use Paturi’s approximation lower bounds (1992), suitably generalized here to clusters of real nodes in [0,n] and interpreted in their dual form, to prove that a typical predicate behaves analogous to the parity predicate with respect to a smooth distribution on the inputs.
∗An extended abstract of this article appeared in Proceedings of the 49th IEEE Symposium on Foundations of Computer Science (FOCS), pages 384–393, 2008.
†The University of Texas at Austin, Department of Computer Sciences, Austin, TX 78712 USA. Email: sherstov@cs.utexas.edu.

1 Introduction
The unbounded-error model, due to Paturi and Simon [28], is a rich and elegant model of communication. Fix a function f : X × Y → {0, 1}, where X and Y are some finite sets. Alice receives an input x ∈ X, Bob receives y ∈ Y, and their objective is to compute f (x, y). To this end, they exchange bits through a shared communication channel according to a strategy, or protocol, established in advance. Alice and Bob each have an unlimited private source of random bits which they can use in deciding what messages to send. Eventually, Bob concludes this process by sending Alice a single bit, which is taken to be the output of their joint computation. Let the random variable P(x, y) ∈ {0, 1} denote the output bit when the parties receive inputs x ∈ X and y ∈ Y. Alice and Bob’s protocol is said to compute f if
P[P(x,y)= f(x,y)]> 1 2
for all x ∈ X, y ∈ Y. The cost of a given protocol is the worst-case number of bits exchanged on any input (x, y). The unbounded-error communication complexity of f, denoted U( f ), is the least cost of a protocol that computes f.
The unbounded-error model occupies a special place in the study of communi- cation because it is more powerful than any of the usual models (deterministic, nondeterministic, randomized, quantum with or without entanglement). More precisely, the unbounded-error complexity U ( f ) can be only negligibly greater than the complexity of f in these other models—and often, U ( f ) is exponentially smaller. For completeness, we provide precise quantitative statements in Sec- tion 2.1. The power of the unbounded-error model resides in its very liberal success criterion: it suffices to produce the correct output with probability greater than 1/2 (say, by an exponentially small amount). This contrasts with the more familiar bounded-error models, in which the correct output is expected with probability at least 2/3.
1.1 Motivation
The additional power of the unbounded-error model has a consequence that prov- ing communication lower bounds in it requires richer mathematical machinery. Furthermore, the resulting lower bounds will have applications that other commu- nication models could not have. Before we state our results, we take a moment to review a few of these applications.
Circuit complexity. Recall that a threshold gate g with Boolean inputs x1,…,xn isafunctionoftheformg(x)=sgn(a1x1 +···+anxn −θ),forsome
1

fixed reals a1,…,an,θ. Thus, a threshold gate generalizes the familiar majority gate. A major unsolved problem in computational complexity is to exhibit a Boolean function that requires a depth-2 threshold circuit of superpolynomial size.
Communication complexity has been crucial to the progress on this problem. Via randomized communication complexity, many explicit functions have been found [11, 10, 26, 35, 38] that require depth-2 majority circuits of exponential size. By the reductions due to Goldman et al. [10], these lower bounds remain valid for the broader class of majority-of-threshold circuits. This solves a special case of the problem. The unbounded-error model solves another special case [8]: it supplies exponential lower bounds against threshold-of-majority circuits, i.e., circuits with a threshold gate at the top that receives inputs from majority gates.
Sign-rank and rigidity. Fix a real matrix M = [Mi j ] without zero entries. The sign-rank of M, denoted rk±(M), is the least rank of a matrix A = [Aij] with Mi j Ai j > 0 for all i, j. In other words, sign-rank measures the sensitivity of the rank of M when its entries undergo sign-preserving perturbations. Sensitivity of the rank is a well-studied subject in complexity theory. For example, much work has focused on the closely related concept of matrix rigidity [23, 14].
Surprisingly, sign-rank and unbounded-error complexity turn out to be equiv- alent notions. Specifically, Paturi and Simon [28] showed that every function f:X×Y → {0,1}satisfiesU(f) = log2rk±(M)±O(1),whereM =
[(−1)f(x,y)]x∈X,y∈Y.
PAC learning. In a seminal paper, Valiant [40] formulated the probably approx- imately correct (PAC) model of learning, now a central model in computational learning theory. Research has shown that PAC learning is quite difficult. (By “PAC learning,” we mean PAC learning under arbitrary distributions.) Indeed, the learning problem remains unsolved for such natural concept classes as DNF formulas of polynomial size and intersections of two halfspaces, whereas hardness results and lower bounds are abundant [15, 16, 20, 6, 21, 19].
There is, however, an important case when efficient PAC learning is possible. Let C be a given concept class. For notational convenience, view the functions in C as mappings {0, 1}n → {−1, +1} rather than {0, 1}n → {0, 1}. The dimension complexity of C , denoted dc(C ), is the least r for which there exist functions φ1,…,φr : {0,1}n → R such that every f ∈ C is expressible as
f(x) ≡ sgn(a1φ1(x) + ··· + arφr(x)) for some reals a1,…,ar. (The functions φ1,…,φr themselvesneednotbelongtoC,butinpracticetheyoftendo.)Thereis a simple and well-known algorithm [18], based on linear programming, that PAC learns C in time polynomial in dc(C ). To relate this discussion to sign-rank (or
2

equivalently, to unbounded-error complexity), let MC = [ f (x )] f ∈C , x ∈{0,1}n be the characteristic matrix of C . A moment’s reflection reveals that dc(C ) = rk±(MC ), i.e., the dimension complexity of a concept class is precisely the sign-rank of its characteristic matrix.
Thus, the study of unbounded-error complexity yields nontrivial PAC learning algorithms. Indeed, the current fastest algorithm for learning polynomial-size DNF formulas in n variables [18] was obtained precisely by placing an upper bound of 2O ̃(n1/3) on the dimension complexity of that concept class. The dimension- complexity paradigm captures many other efficient PAC learning algorithms de- signed to date, with the notable exception of learning low-degree polynomials over GF( p).
1.2 Our Result
To summarize, the unbounded-error model has applications to circuit complexity,
matrix analysis, and learning theory, in addition to its intrinsic appeal as a model
of communication. Despite this motivation, progress in understanding unbounded-
error complexity has been slow and difficult. Indeed, we are aware of only a few
nontrivial results on this subject. Alon et al. [1] obtained strong lower bounds
for random functions. In a breakthrough result, Forster [7] proved the first strong
lower bound for an explicit function. Forster’s proof has seen several extensions
and refinements [8, 9]. Subsequent to our work, Razborov and Sherstov [33] solved
an open problem regarding the comparative power of alternation (the classes 􏱏cc
{0,1}n →{0,1}oftheform
f (x, y) = D(|x ∧ y|)
for a given predicate D: {0,1,…,n} → {0,1}. Here |x ∧ y| stands for the number of positions where x and y both have a 1. Familiar examples of such functions include DISJOINTNESS (determining if x and y intersect) and INNER PRODUCT MODULO 2 (determining if x and y intersect in an odd number of positions). Symmetric functions have seen much work in communication com- plexity. An illustrative example is the D I S J O I N T N E S S function, whose study has led to considerable advances [13, 31, 29, 4] in randomized communication complexity. Symmetric functions have also contributed to the progress in quantum communication complexity, starting with the breakthrough result of Razborov [32] and continuing with more recent work, e.g., [17, 37, 39].
Our main result settles the unbounded-error complexity of every symmetric function, to within logarithmic factors. The only symmetric function whose
3
2 This paper focuses on symmetric functions, i.e., functions f : {0, 1}n ×
and 􏱎cc) and unbounded-error communication, posed by Babai et al. [3]. 2

unbounded-error complexity was known prior to this work was INNER PRODUCT MODULO 2, for which Forster [7] proved a tight lower bound of 􏱐(n). The general result that we prove is in terms of the degree deg(D) of a given predicate D, defined as the number of times D changes value in {0, 1, . . . , n}. In other words, deg(D) = |{i : D(i) ̸= D(i − 1)}|.
Theorem 1.1 (Main Result). Let D: {0,1,…,n} → {0,1} be given, k = deg(D). Define f (x, y) = D(|x ∧ y|). Then
􏱍(k/ log5 n) 􏰞 U ( f ) 􏰞 􏱍(k log n).
For a somewhat stronger quantitative statement, see Theorem 6.3.
The upper bound in this result has a short and elementary demonstration (see the proof of Theorem 6.3), and this paper is devoted entirely to the proof of the lower bound. The lower bound uses a combination of techniques (random walks on Zn2, univariate approximation theory, linear programming duality), with Forster’s
general method as a starting point.
1.3 Proof Outline
Our proof consists of two independent parts. First, we reduce the original problem to analyzing what we call dense predicates. These are predicates D: {0,1,…,n} → {0,1} that change value frequently and at roughly regular intervals. Dense predicates are highly structured and amenable to direct analysis, unlike general predicates. With this reduction in hand, we complete the proof by solving the problem for every dense predicate. We now describe the two technical components in greater detail.
Reduction to dense predicates. Let D : {0, 1, . . . , n} → {0, 1} be a given pred- icate that is not dense. Any communication protocol that computes D can clearly compute the restriction of D to a given subinterval {i, i +1, . . . , j } ⊂ {0, 1, . . . , n}. Now, let D denote the set of all restrictions of D to subintervals of a given length. Using a probabilistic argument, we show that a dense predicate arises as the XOR of a small number T of predicates from D (where T depends on the degree of D). As a result, if the original predicate D has unbounded-error complexity ≪ deg(D), then some dense predicate will have disproportionately small unbounded-error complexity. This is the desired reduction.
The technical challenge here is to show that a dense predicate can be obtained as the XOR of a small number of predicates from D. To this end, we model the
4

probabilistic argument as a random walk on Zn2 and bound its mixing time. Our analysis uses a known bound, due to Razborov [30], on the rate of convergence in terms of the probability of a basis for Zn2 (see Lemma 3.3).
Solution for dense predicates. Using Chebyshev polynomials and the Markov- Bernstein inequalities, Paturi [27] determined the least degree of a polynomial that approximates any given Boolean predicate on {0,1,…,n} pointwise to within 1/3. A starting point in our analysis is a related approximation problem, in which thenodesarenolonger{0,1,…,n}butaresomearbitraryreals{a1,a2,…,an} ⊂ [0, n]. Provided that the nodes are not too clumped together, we are able to prove strong lower bounds on the degree for a relevant class of approximation problems
f : {a1,a2,…,an} → {0,1}. Paturi’s proof technique does not apply in this more general setting, and we give a direct analysis using fundamentals of approximation theory.
The next step is to show that computation of dense predicates corresponds to the approximation problem just described, where the real nodes a1, a2, . . . , an are allowed to form clusters but must still cover much of the interval [0,n]. Linear programming duality now tells us that, in a well-defined technical sense, a dense predicate behaves like the PARITY function with respect to a smooth distribution on the inputs. This enables us to bound the spectral norm of relevant matrices using the pattern matrix method [37]. In a final step, we invoke Forster’s generalized theorem [8] to obtain our main result.
Comparison with related work. Alon et al. [1] introduced ideas from real algebraic geometry to the study of the unbounded-error complexity of random functions. Forster [7] used matrix analysis and a compactness argument to give the first strong lower bound for an explicit function. Follow-up work gave several matrix-analytic improvements [8, 9] on Forster’s method. The recent result due to Razborov and Sherstov [33] is built around a new method of analyzing multivariate forms p on Rn , whereby one projects p in several ways to a univariate polynomial, analyzes these simpler objects, and recombines the results using Fourier-theoretic tools.
Our approach is quite different from these works. To our knowledge, we give the first application of random walks to unbounded-error complexity. This technique generalizes beyond symmetric functions and, in fact, applies whenever one seeks a lower bound on the unbounded-error complexity of a set F of functions. The quality of the resulting communication lower bound will depend on how fast a random XOR-walk on F mixes to a hard function.
The second part of our proof is also based on a new idea, which effectively 5

allows us to treat a dense predicate as if it were the PARITY function. The insight here is that Paturi’s approximation lower bounds [27], suitably generalized to clusters of real nodes in [0, n] and examined in their dual form, show that a dense predicate behaves analogous to PARITY with respect to a smooth distribution on the inputs. We introduce the term smooth orthogonalizing distribution to describe this technique. The smoothness property is crucial to applying Forster’s method [7].
1.4 Organization
Section 2 provides necessary technical background. Section 3 opens the proof with the reduction to dense predicates. Section 4 solves a certain problem in discrete approximation. Section 5 translates this approximation result, via linear programming duality and the Fourier transform, into an existence proof of smooth orthogonalizing distributions for every dense predicate. Section 6 combines the above ingredients to give the final lower bounds on unbounded-error complexity.
2 Preliminaries
A Boolean function is a mapping X → {0, 1}, where X is a finite set. Typical casesareX ={0,1}n andX ={0,1}n×{0,1}n.Thenotation[n]standsfortheset {1, 2, . . . , n}. Throughout this manuscript, “log” refers to the logarithm to base 2. The symbol Pk refers to the set of univariate polynomials of degree up to k.
For x ∈ {0,1}n, we define |x| = x1 + x2 + ··· + xn. For x, y ∈ {0,1}n, the notation x ∧ y refers as usual to the component-wise AND of x and y. In particular, |x ∧ y| stands for the number of positions where x and y both have a 1. At several places in this manuscript, it will be important to distinguish between addition over the reals and addition over GF(2). To avoid any confusion, we reserve the operator + for the former and ⊕ for the latter.
Random walks on Zn2 play an important role in this work. In particular, it will be helpful to recall the following fact.
Proposition 2.1 (Folklore). For an integer T 􏰟 1, let b1, b2, . . . , bT ∈ {0, 1} be independent random variables, each taking on 1 with probability p. Then
E􏰙b ⊕b ⊕···⊕b 􏰚=1−1(1−2p)T. 12T22
Proof. Straightforward by induction on T.
6

Predicates. A predicate is a mapping D: {0,1,…,n} → {0,1}. We say that a value change occurs at index t ∈ {1,2,…,n} if D(t) ̸= D(t − 1). The degree of D, denoted deg(D), is the total number of value changes of D. For example, the familiar predicate PARITY(t) = t mod 2 has degree n, whereas a constant predicate has degree 0. It is not hard to show [2] that deg(D) is the least degree of a real univariate polynomial p such that sgn(p(t)) = (−1)D(t), t = 0,1,…,n, hence the term degree. Finally, given two predicates D1, D2 : {0, 1, . . . , n} → {0, 1}, recall that their XOR is the predicate D1 ⊕ D2 : {0,1,…,n} → {0,1} defined by (D1 ⊕ D2)(t) = D1(t) ⊕ D2(t).
Matrices. The symbol Rm×n refers to the family of all m × n matrices with real entries. The (i, j)th entry of a matrix A is denoted by Aij. We frequently use “generic-entry” notation to specify a matrix succinctly: we write A = [F(i, j)]i,j to mean that the (i, j)th entry of A is given by the expression F(i, j). In most matrices that arise in this work, the exact ordering of the columns (and rows) is irrelevant. In such cases we describe a matrix by the notation [F(i, j)]i∈I, j∈J, where I and J are some index sets. In specifying matrices, we will use the symbol ∗ for entries whose values are irrelevant, as in the proofs of Lemmas 3.2 and 3.5. Recall that the spectral norm of a matrix A ∈ Rm×n is given by
∥A∥ = max ∥Ax∥2, x∈Rn, ∥x∥2=1
where ∥ · ∥2 is the Euclidean norm on vectors.
Fourier transform over Zn2 . Consider the vector space of functions {0, 1}n →
R, equipped with the inner product
⟨f,g⟩=2−n 􏰕 f(x)g(x).
x ∈{0,1}n
For S ⊆ [n], define χS : {0,1}n → {−1,+1} by χS(x) = (−1)􏰂i∈S xi . Then {χS}S⊆[n] is an orthonormal basis for the inner product space in question. As a result, every function f : {0, 1}n → R has a unique representation of the form
f(x)= 􏰕 fˆ(S)χS(x), S⊆[n]
where fˆ(S)=⟨f,χS⟩.Thereals fˆ(S)arecalledtheFouriercoefficientsof f.The following fact is immediate from the definition of fˆ(S).
7

Proposition 2.2. Let f : {0, 1}n → R be given. Then max|fˆ(S)|􏰞2−n 􏰕 |f(x)|.
S⊆[n]
x ∈{0,1}n
Symmetric functions. Let Sn denote the group of permutations [n] → [n]. A function φ : {0, 1}n → R is called symmetric if φ (x ) is uniquely determined by x1 +···+xn.Equivalently,φ issymmetricifφ(x) = φ(xσ(1),…,xσ(n))forevery x ∈ {0,1}n and every σ ∈ Sn. Observe that for every φ: {0,1}n → R (symmetric or not), the derived function
φsym(x) = 1 􏰕 φ(xσ(1),…,xσ(n)) n! σ∈Sn
is symmetric. Symmetric functions on {0, 1}n are intimately related to univari- ate polynomials, as demonstrated by Minsky and Papert’s symmetrization argu- ment [24]:
Proposition 2.3 (Minsky and Papert). Let φ : {0, 1}n → R be symmetric with φˆ(S) = 0 for |S| > r. Then there is a polynomial p ∈ Pr with φ(x) = p(|x|) for all x ∈ {0,1}n.
2.1 The Unbounded-Error Model of Communication
We continue the review started in the introduction. Readers with background in communication complexity will note that the unbounded-error model is exactly the same as the private-coin randomized model [22, Chap. 3], with one exception: in the latter case the correct answer is expected with probability at least 2/3, whereas in the former case the correctness probability need only exceed 1/2 (say, by an exponentially small amount). This difference has far-reaching implications. For example, the fact that the parties in the unbounded-error model do not have a shared source of random bits is crucial: allowing shared randomness would make the complexity of every function a constant, as one can easily verify. By contrast, introducing shared randomness into the randomized model has minimal impact on the complexity of any given function [25].
As one might expect, the weaker success criterion in the unbounded-error model has a drastic impact on the complexity of certain functions. For example, the well-known DISJOINTNESS function on n-bit strings has complexity 􏱍(log n) in the unbounded-error model (see Proposition 6.5) and 􏱍(n) in the randomized
8

model [13, 31]. Furthermore, explicit functions are known [5, 36] with unbounded- error complexity O(log n) that require 􏱐(√n) communication in the randomized model to even achieve advantage 2−√n/5 over random guessing.
More generally, the unbounded-error complexity of a function f : X × Y → {0, 1} is never much more than its complexity in the other standard models. For example, it is not hard to see that
U( f ) 􏰞 min{N0( f ), N1( f )} + O(1) 􏰞 D( f ) + O(1),
where D, N0, and N1 refer to communication complexity in the deterministic, co-nondeterministic, and nondeterministic models, respectively. Continuing,
U( f ) 􏰞 R1/3( f ) + O(1)
􏰞 O􏰆Rpub(f)+loglog[|X|+|Y|]􏰇,
1/3
where R1/3 and Rpub refer to the private- and public-coin randomized models,
1/3
respectively. As a matter of fact, one can show that
U(f)􏰞 O􏰃Q∗1/3(f)+loglog[|X|+|Y|]􏰄,
where Q∗1/3 refers to the quantum model with prior entanglement. An identical inequality is clearly valid for the quantum model without prior entanglement. See [22, 41] for rigorous definitions of these various models; our sole intention was to point out that the unbounded-error model is at least as powerful.
A compelling aspect of the unbounded-error model is that it has an exact interpretation in matrix-analytic terms. Specifically, let M = [Mi j ] be a real matrix without zero entries. Define the sign-rank of M by:
rk±(M)=min{rkA:MijAij >0 foralli,j}. A
In words, rk±(M) is the least rank of a real matrix A whose entries each have the same sign as the corresponding entry of M. Paturi and Simon made the following important observation [28, Thm. 2].
Theorem 2.4 (Paturi and Simon). Let X, Y be finite sets and f : X × Y → {0, 1} a given function. Put M = [(−1) f (x,y)]x∈X,y∈Y . Then
U(f) = logrk±(M) ± O(1).
9

Paturi and Simon’s original observation concerned X = Y = {0,1}n, but their proof readily extends to arbitrary sets. In words, the unbounded-error complexity of a function essentially equals the logarithm of the sign-rank of its communication matrix. This equivalence is often helpful: sometimes it is more convenient to reason in terms of communication protocols, and sometimes the matrix formulation offers more insight.
The power of the unbounded-error model makes it a challenging model in which to prove communication lower bounds. In a breakthrough result, Forster [7] proved the first strong lower bound in the unbounded-error model for an explicit function. Forster’s proof generalizes to yield the following result [8, Thm. 3], which serves as a crucial starting point for our work.
Theorem2.5(Forsteretal.).LetX,YbefinitesetsandM=[Mxy]x∈X,y∈Y areal matrix without zero entries. Then

rk±(M) 􏰟 |X| |Y| min|Mxy|.
∥M∥ x,y
We close this overview by discussing some closure properties of the unbounded-error model. Given functions f, g : X × Y → {0, 1}, recall that their XOR is the function f ⊕ g : X × Y → {0, 1} defined by ( f ⊕ g)(x, y) =
f (x, y) ⊕ g(x, y). We have:
Proposition 2.6 (Folklore). Let f, g : X × Y → {0, 1} be arbitrary. Then
U(f ⊕g)􏰞U(f)+U(g).
Proof. Alice and Bob can evaluate f and g individually and output the XOR of the two answers. It is straightforward to verify that this strategy is correct with probability greater than 1/2.
In what follows, we will be interested primarily in the complexity of predicates D: {0,1,…,n} → {0,1}. Specifically, we define U(D) to be the unbounded- error communication complexity of the function f : {0, 1}n × {0, 1}n → {0, 1} givenby f(x,y)=D(|x∧y|).
2.2 Pattern Matrices
Pattern matrices were introduced in [38, 37] and proved useful in obtaining strong lower bounds on communication. Relevant definitions and results from [37] follow.
10

Let t and n be positive integers with t | n. Split [n] into t contiguous blocks, each with n/t elements:
􏰴 n􏰵 􏰌n 2n􏰍 􏰌(t−1)n 􏰍 [n]= 1,2,…,t ∪ t +1,…, t ∪···∪ t +1,…,n .
Let V (n, t) denote the family of subsets V ⊆ [n] that have exactly one element in each of these blocks (in particular, |V| = t). Clearly, |V (n,t)| = (n/t)t. For a bit string x ∈ {0,1}n and a set V ∈ V (n,t), define the projection of x onto V by
x|V = (xi1,xi2,…,xit ) ∈ {0,1}t, wherei1 1α(1−α)T−1k 2
􏱄 􏱅􏱆 􏱇􏱄 􏱅􏱆 􏱇
􏰟(1−α)T −1(v1+v2+···+vm )
<(1−α)T (v1+v2+···+vm ) (3.3) 􏰟 1α(1−α(T −1))k 2 1 􏰈 1 + log(n/k)􏰉 􏰟2α 1−α· log(2−2α) k. Setα = 0.23/(1+log(n/k)),i = ⌊m/2T⌋+1,and j = ⌊m/2T−1⌋.Thenone easily verifies (3.2), while (3.1) is immediate from (3.3). We are now ready to prove the desired reduction to high-degree predicates. Throughout this proof, we will freely use the opening remarks of Section 3, often without mention. Lemma 3.2 (Reduction from arbitrary to high-degree predicates). For all integers n, k with 1 􏰞 k 􏰞 n, where 5􏰌1􏰍 U(n,k) 􏰟 K min U(m,⌈αm⌉) , 6 m=K,...,n, m 1/4􏰞α􏰞1 􏱛1k􏱜 K= 141+log(n/k) . Proof. Let D: {0,1,...,n} → {0,1} be any predicate with deg(D) = k. As outlined in the Introduction, the intuition is to express some complicated (i.e., high- degree) predicate as the XOR of a small number of predicates derived from D. The details follow. 13 Let v = (v0,v1,...,vn) be the flip vector of D. Apply Lemma 3.1 to (v1,...,vn)andleti,jbetheresultingindices,i 􏰞 j.Putm = j−i+1. Sincevi +···+vj 􏰟 K,wehave K 􏰞 m 􏰞 n. (3.4) Define predicates D−(m−1),..., D0,..., Dm−1, each a mapping {0,1,...,m} → {0,1},byDr(t)≡ D(t+i−1+r).Then(3.2)showsthateachofthesepredicates can be computed by taking a protocol for D and fixing all but the first m variables to appropriate values. Thus, U(D)􏰟U(Dr), r =−(m−1),...,(m−1). (3.5) The flip vector of D0 is (∗,vi,...,vj) for some ∗ ∈ {0,1}, which means that deg(D0) = vi + ··· + vj. If deg(D0) > m/2, then the theorem is true for D in view of (3.4) and (3.5). Thus, we can assume the contrary:
K 􏰞vi +···+vj 􏰞 1m. (3.6) 2
If we write the flip vectors of D−(m−1), . . . , Dm−1 one after another as row vectors, we obtain the following matrix A:
∗ ∗ ∗ ∗ ∗ ··· ∗ ∗ ∗ vi  ∗ ∗ ∗ ∗ ∗ ··· ∗ ∗ vi vi+1
∗ ∗ ∗ ∗ ∗ ··· ∗ vi vi+1 vi+2 . . . . . . . . .
….. ….
A =  ∗ vi vi+1 vi+2 vi+3 ··· vj−3 vj−2 vj−1 vj . . . . . . . . . . ….. ….
∗vj−2vj−1vj ∗···∗∗∗∗ ∗vj−1vj ∗∗···∗∗∗∗
∗ vj ∗ ∗ ∗ ··· ∗ ∗ ∗ ∗
Let T be a suitably large integer to be named later, and let u(1), u(2), . . . , u(T ) be independent random vectors, each selected uniformly from among the rows of A. Put u = u(1) ⊕u(2) ⊕· · ·⊕u(T ). We will index the columns of A and the components of all these vectors by 0, 1, . . . , m (left to right). Let pr stand for the fraction of 1s in the rth column of A. Every column of A, except the zeroth, contains vi,…,vj and some m − 1 additional values. One infers from (3.6) that
K 􏰞pr 􏰞3, 2m 4
14
r=1,2,…,m. (3.7)

Therefore,
E􏰙(u) +···+(u) 􏰚=􏰕E􏰙(u(1)) ⊕ ··· ⊕ (u(T)) 􏰚
m
1mrr r=1
􏰕m􏰈11 T􏰉
2 − 2(1−2pr) 1􏰈 1􏰉
byProposition2.1 by(3.6),(3.7).
Fix T = ⌈(ln2)m/K⌉. Then by the last calculation, there is a vector u =
(u0,u1,…,um) that satisfies u1 + ··· + um 􏰟 m/4 and is the XOR of some
T rows of A. In other words, there is a predicate D⊕: {0,1,…,m} → {0,1}
=
􏰟2m 1−eTK/m
r=1
that satisfies deg(D⊕) 􏰟 m/4 and is the XOR of some T 􏰞 6m predicates 5K
from among D−(m−1), . . . , Dm−1. This completes the proof in view of (3.5) and Proposition 2.6.
3.2 Reduction from High-Degree to Dense Predicates
The proof in this section uses the same setup as Lemma 3.2, except the argument is now more involved. The reason is that the previous averaging argument is not strong enough to yield a dense predicate, which is a highly structured object. To overcome this, we recast the previous argument as a random walk on Zn2 and show that it mixes rapidly. In particular, we will need the following lemma that bounds the mixing time of a random walk [30, Lem. 1]; for an English translation, see Jukna [12, Lem. 24.3].
Lemma 3.3 (Razborov). Fix a probability distribution μ on {0, 1}n . Let {v(1), v(2), . . . , v(n)} be a basis for {0, 1}n as a vector space over GF(2). Put
p = min􏰷μ(0n),μ(v(1)),μ(v(2)),…,μ(v(n))􏰸.
Let u(1), . . . , u(T ) be independent random vectors, each distributed according to μ.
Then for every v ∈ {0, 1}n ,
􏰅􏰅P􏰥u(1) ⊕···⊕u(T) =v􏰦−2−n􏰅􏰅􏰞e−2Tp.
We are ready to formally define dense predicates and give the promised reduction.
15

Definition 3.4 (Dense predicate). Let n, b be positive integers and d 􏰟 0 a real number. A predicate D is called (n, b, d)-dense if D is a predicate {0, 1, . . . , n} → {0,1} with flip vector (v0,v1,…,vn) satisfying
vrb+1 +vrb+2 +···+v(r+1)b 􏰟d, r =0,1,2,…,􏱺n􏱻−1. b
Lemma 3.5 (Reduction from high-degree to dense predicates). Let D: {0,1,…,n} → {0,1} be a predicate with deg(D) 􏰟 1n. Let b be any integer
with1􏰞b􏰞 1 n.Then 350
4
U(D) 􏰟 b U(D′), nlogn
where D′ is some (m, ⌈log n⌉b, 1 b)-dense predicate and 1 n 􏰞 m 􏰞 n. 700 350
Proof. Let (v0,v1,…,vn) be the flip vector of D. Apply Lemma 3.1 to
(v1,…,vn) and let i,l be the resulting indices (i 􏰞 l). It will be convenient
to work with a somewhat smaller subvector v = (vi,…,vj), where we define
j ∈{i,…,l}tobethelargestintegersothatb|(j−i+1).Sinceb􏰞 1 nand
vi +···+vl 􏰟 1 n,thisgives: 168
vi +···+vj 􏰟 1 n. 350
350
Definingm = j−i+1,weinferthat 1 n 􏰞 m 􏰞 n,asdesired. Weview 350
v = (vi,…,vj) as composed of consecutive blocks, each b bits long: 
,······, . (3.9) block m/b
(3.8)
v= , block 1
vi,…,vi+b−1
vi+b, . . . , vi+2b−1
block 2
For r = 1, 2, . . . , b, define the rth layer of v, denoted z(r), to be the vector obtained
by taking the r th component from each of the above blocks:
z(r) = (vi−1+r,vi−1+b+r,…,vj−b+r) ∈ {0,1}m/b.
We say of a layer z that it is perfect if it does not have ⌈logn⌉ consecutive components equal to 0. If more than 1 b of the layers are perfect, take D′ to
vj−b+1,…,vj
700
be the predicate with flip vector (v0 ⊕ ··· ⊕ vi−1,vi,…,vj). Clearly, D′ is
(m,⌈logn⌉b, 1 b)-dense. Furthermore, U(D′) 􏰞 U(D), by the same argument 700
as in Lemma 3.2. As a result, the theorem holds in this case.
Thus, we may assume that at least (1 − 1 )b of the layers are not perfect. In 700
viewof(3.8),atmost(1− 1 )blayerscanbezerovectors.Therefore, 1 bormore
350
700
16

layers are nonzero and not perfect. These are the only layers we will consider in the remainder of the proof.
Define predicates D−(m−b), D−(m−2b), . . . , D−b, D0, Db, . . . , Dm−2b, Dm−b, each a mapping {0,1,…,m} → {0,1}, by Dr(t) ≡ D(t + i − 1 + r). These are a subset of the predicates from the proof of Lemma 3.2, and again
U(D) 􏰟 U(Dr) for each r. (3.10) Writing the flip vectors of these predicates one after another as row vectors yields
the following matrix B: B=
∗∗∗∗···∗∗ ∗∗∗∗···∗ 
…. … …….
block 1
block 1
block 2
 ∗ ··· , …. …
block 1
block 2
block 3
block m − 2 b
block m − 1 b
block m b
……. ∗ ∗ ··· ∗ ∗ ∗ 
block m − 1 b
block m b
block m b
∗∗∗···∗∗∗
where the blocks refer to the partition in (3.9). Let T be a suitably large integer to be named later, and let u(1), u(2), . . . , u(T ) be independent random vectors, each selected uniformly from among the rows of B. Put u = u(1) ⊕ u(2) ⊕ · · · ⊕ u(T ). We will index the columns of B and the components of u by 0, 1, . . . , m (left to right). Key to analyzing the distribution of u is the following claim.
Claim 3.5.1. Let T 􏰟 (m/b)lnn. Let 􏱌 ∈ {1,2,…,b} be such that the layer z(􏱌) is nonzero and not perfect. Let s ∈ {0,b,2b,3b,…} be such that s + ⌈logn⌉b 􏰞 m. Then
P 􏰙(u)s+􏱌 = (u)s+b+􏱌 = · · · = (u)s+(⌈log n⌉−1)b+􏱌 = 0􏰚 􏰞 2 . n
Proof. Let B′ be the matrix whose columns are the following columns of B: s + 􏱌,s + b + 􏱌,…,s + (⌈logn⌉ − 1)b + 􏱌, in that order. Since z(􏱌) is nonzero and not perfect, z(􏱌) has ⌈log n⌉ + 1 consecutive components with values either 0,0,…,0,1or1,0,0,…,0.Consequently, B′ mustcontainoneofthefollowing submatrices, each of size (⌈log n⌉ + 1) × ⌈log n⌉:
17


1  1  1
000···00 ∗ 01 
   or  .
 10
11 ∗ 1  1 
1 000···00 2m
The claim now follows from Lemma 3.3, since 2−⌈log n⌉ + e−2T · b 􏰞 2/n.
We return to the proof of the lemma. Fix T = ⌈(m/b)lnn⌉. Let s = 0 and
apply Claim 3.5.1 with every 􏱌 ∈ {1, 2, . . . , b} for which the layer z(􏱌) is nonzero
and not perfect. Since there are at least 1 b such choices for 􏱌, we conclude by 700
the union bound that 􏰊1􏰋2
P (u)1+(u)2+···+(u)⌈logn⌉b<700b 􏰞b·n. The same calculation applies to the next set of ⌈logn⌉b components of u (i.e., s = ⌈log n⌉b), and so on. Applying a union bound across all these m/(⌈log n⌉b) calculations, we find that with probability m 􏰈 2􏰉 1−⌈logn⌉b b·n >0,
the predicate whose flip vector is u is (m,⌈logn⌉b, 1 b)-dense. Fix any such 700
predicate D′. Since D′ is the XOR of T 􏰞 (nlogn)/b predicates from among D−(m−b), . . . , Dm−b, the lemma follows by (3.10) and Proposition 2.6.
4 Univariate Approximation with Clusters of Nodes
Crucial to our study of dense predicates are certain approximation problems to which they give rise. Roughly speaking, the hardness of such an approximation problem for low-degree polynomials translates into the communication hardness of the associated predicate. This section carries out the first part of the program, namely, showing that the approximation task at hand is hard for low-degree polynomials. We examine this question in its basic mathematical form, with no extraneous considerations to obscure our view. How communication fits in this picture will become clear in the next two sections.
18

For a finite set X ⊂ R, a function f : X → R, and an integer r 􏰟 0, define ε∗(f,X,r)=minmax|p(x)− f(x)|.
p∈Pr x∈X
In words, ε∗( f, X,r) is the least error (in the uniform sense) to which a degree- r polynomial can approximate f on X. The following well-known fact from approximation theory is useful in estimating this error.
Fact 4.1 (see, e.g., [34, Thm. 1.15]). Let X = {x1, x2, . . . , xr+2} be given reals, where x1 < x2 < ··· < xr+2. Let f : X → R be given. Put ω(x) = (x − x1)(x − x2)···(x − xr+2). Then 􏰅􏰅􏰂r +2 ′ 􏰅􏰅 ε∗(f,X,r)= 􏰅 i=1 [f(xi)/ω(xi)]􏰅. 􏰂r+2 [1/|ω′(xi)|] i=1 To develop some intuition for the work in this section, consider the following approximation problem. Let f : {0, 1, . . . , n} → {0, 1} be defined by 􏰀1 if x = ⌊n/2⌋, It is well-known that any polynomial that approximates f within 1/3 has degree 􏱐(n). For example, this follows from work by Paturi [27]. The approximation problem of interest to us is similar, except that our points need not be as evenly spaced as 0, 1, . . . , n but rather may form clusters. As a result, Paturi’s results and methods do not apply, and we approach this question differently, using the first- principles formula of Fact 4.1. Specifically, our main result in this section is as follows. Lemma 4.2 (Inapproximability by low-degree polynomials). Let positive in- tegers L,d and a real number B 􏰟 d be given. Let {xij : i = 1,...,L; j=1,...,d}beasetofLddistinctreals,wherexij ∈[(i−1)B,iB]and |xij − xi′ j′| 􏰟 1 for (i, j) ̸= (i′, j′). (4.1) Let x0 ∈ [1 LB, 3 LB]. Then any polynomial p with 44 f (x) = 0 otherwise. p(x0)=1, has degree at least 􏰃 1 L − 1􏰄 d . 2 1􏰈 1 􏰉4d+1 |p(xij)|<2 LB 19 foralli,j Proof. Define f(x)by f(x)= 0 ifx=xijforsomei,j. 􏰀1 ifx=x0, By symmetry, we can assume that x0 ∈ [1 LB, 1 LB]. Fix an integer l 􏰞 ⌈1 L⌉ so that x0 ∈ [(l − 1)B, lB]. Put X ={x0}∪{xij :i =1,...,2l−1; j =1,...,d}. With ω(x) = 􏰹y∈X (x − y), Fact 4.1 implies that ε∗(f,X,|X|−2)􏰟 1 minx∈X |ω′(x)|. j=1 i=1 􏰞 δ · 􏰃l! l! B2l−1􏰄d . On the other hand, every xi′ j′ ∈ X satisfies: j=1 i=1 B 􏱄 􏱅􏱆 􏱇 􏰞|i −l|+1 by(4.1) i =1,...,2l−1 i∈/{i′−1,i′,i′+1} d 􏰟δ􏰖􏰖BB 422 (4.2) We proceed to estimate the denominator and numerator of (4.2). Since x0 is distinct from each xi j , the quantity satisfies δ > 0. We have:
δ =
min |x0 − xi j | i =1,…,2l−1,
j =1,…,d
d 2l−1
|ω′(x0)| = 􏰖􏰖|x0−xij| 􏰞 δ 􏰖􏰖B
|X| |ω′(x0)|
d 2l−1 􏱛|x0−xij|􏱜
|ω′(xi′j′)| = 􏰖 x∈X\{xi′ j′}
d
􏰟 δ 􏰖 j =1
|x−xi′j′|
􏰖
|xij −xi′j′|
􏱙|xij −xi′j′|􏱚
(4.3)
j =1 i =1,…,2l−1 i∈/{i′−1,i′,i′+1}
􏰈 l! l! B 2l−4 􏰉d 􏰟δ· l4 .
20
􏱄 􏱅􏱆 􏱇
􏰟|i−i′|−1
(4.4)

Now (4.2) yields, in view of (4.3) and (4.4):
∗ 1􏰈 1 􏰉4d+1
ε(f,X,|X|−2)􏰟2 LB , which concludes the proof since | X | 􏰟 􏰃 1 L − 1􏰄 d + 1.
2
5 Key Analytic Property of Dense Predicates
We now transition to the final ingredient of our proof, smooth orthogonalizing distributions for a given predicate D. This informal term refers to a distribution on {0,1}n that does not put too little weight on any point (the smooth part) and under which (−1)D(x1+···+xn) is approximately orthogonal to all low-degree characters χS (the orthogonalizing part). Our task is to establish the existence of such distributions for every dense predicate. Once this is accomplished, we will be able to treat a dense predicate as if it were the familiar PARITY function (whose defining analytic property is precisely its orthogonality to the lower-order characters under the uniform distribution). Crucial to the development below will be the inapproximability result proved in Section 4.
For a polynomial p, a predicate D: {0,1,…,n} → {0,1}, and a number N > 0, define the advantage of p in computing D by
n 􏰃n􏰄
adv(p, N, D) = N min 􏰷(−1)D(t) p(t)􏰸 + 􏰕 t (−1)D(t) p(t).
t =0,…,n t =0 2n
This quantity is conceptually close to the correlation of p and D with respect to the binomial distribution. There is a substantial difference, however: if p and D differ in sign at some point, this causes a penalty term to be subtracted. We will be interested in values N ≫ 1, when even a single error of p results in a large penalty. Define
advr(N,D)=max adv(p,N,D), p
wherethemaximizationisoverp∈Pr with|p(t)|􏰞1fort=0,1,…,n.Aswe now show, this quantity is closely related to smooth orthogonalizing distributions for D.
Theorem 5.1 (Smooth distributions vs. approximation by polynomials). Fix a
predicate D: {0,1,…,n} → {0,1} and an integer r 􏰟 0. Then for every N > 1,
there is a distribution μ on {0, 1}n such that μ(x ) 􏰟 1 for each x and 2nN
􏰅􏰥 D(|x|) 􏰦􏰅 1
􏰅􏰅Ex (−1) μ(x)χS(x) 􏰅􏰅􏰞 2nN advr(N −1,D) for|S|􏰞r.
21

Proof. Put f (x) = (−1)D(|x|) and consider the following linear program:
(LP1)
It suffices to show that the optimum of this program is at most 1 advr (N − 1, D). N
For this, we pass to the dual:
(LP2)
The dual programs (LP1) and (LP2) are both feasible and thus have the same finite
optimum. Therefore, our task reduces to proving that the optimum of (LP2) is at
most 1 advr (N − 1, D). Fix an optimal solution to (LP2). Then N
f (x) 􏰕 αS χS(x) = 􏱌 + ξx for all x, (5.1) |S|􏰞r
since in case of a strict inequality (>) we could increase the corresponding variable ξx by a small amount to obtain a feasible solution with greater value. Furthermore, we claim that  
􏱌= min f(x)􏰕αSχS(x). (5.2)
variables: μ(x) for all x; ε
minimize: ε 􏰅􏰅
subjectto:
􏰅􏰅 􏰅􏰅
􏰅 􏰕 μ(x)f(x)χS(x)􏰅􏰞ε 􏰅􏰅x ∈{0,1}n 􏰅􏰅
for|S|􏰞r,
for each x.
􏰕 μ(x) = 1, x ∈{0,1}n
μ(x) 􏰟 1 2nN
variables: αS (for |S| 􏰞 r); ξx (for all x); 􏱌 
maximize: 1 (N−1)􏱌+1 􏰕(􏱌+ξx) N 2n 
x ∈{0,1}n subjectto: f(x)􏰕αSχS(x)􏰟􏱌+ξx
|S|􏰞r
􏰕 |αS| 􏰞 1,
|S|􏰞r
αS ∈R ξx 􏰟0 􏱌 ∈ R.
forallx,
for|S|􏰞r, forallx,
n x∈{0,1} 
|S|􏰞r 
22

Indeed, let m stand for the right-hand side of (5.2). Then 􏱌 􏰞 m because each ξx is nonnegative. It remains to show that 􏱌 􏰟 m. If we had 􏱌 < m, then (5.1) would implythatξx 􏰟m−􏱌forallx.Asaresult,wecouldobtainanewfeasiblesolution ξx′ =ξx +(􏱌−m)and􏱌′ =m.Thisnewsolutionsatisfies􏱌′ +ξx′ =􏱌+ξx for all x. Moreover, 􏱌′ > 􏱌, which results in a greater objective value and yields the desired contradiction. In summary, 􏱌 = m.
Put
φsym(x) = 1 􏰕 φ(xσ(1),…,xσ(n)). n! σ∈Sn
In view of (5.1) and (5.2), the optimum of (LP2) is
1􏰀1􏱁 max (N −1)min{f(x)φ(x)} + 􏰕 f(x)φ(x) ,
(5.3)
(5.4)
Nφ x 2n
x
where the maximization is over functions φ of the form
φ(x) = 􏰕 αS χS(x), where 􏰕 |αS| 􏰞 1.
|S|􏰞r
Fix φ that optimizes (5.3). By (5.4),
|S|􏰞r
max |φ(x)| 􏰞 1. x ∈{0,1}n
Since f is symmetric, φ and φsym have the same objective value in (5.3). By the symmetrization argument (Proposition 2.3), there is a univariate polynomial
p ∈ Pr with
φsym(x)= p(x1 +···+xn) forallx ∈{0,1}n. For t = 0,1,…,n,
|p(t)|=|p(1+···+1+0+···+0)|􏰞 max |φsym(x)|􏰞 max |φ(x)|􏰞1. 􏱄 􏱅􏱆 􏱇 x ∈{0,1}n x ∈{0,1}n
t times
Replacing φ(x) by p(x1 + · · · + xn) in (5.3), we see that the optimum of (LP2) is
at most
􏰀n􏰈􏰉􏱁
1 max (N − 1) min 􏰷(−1)D(t) p(t)􏰸 + 1 􏰕 n (−1)D(t) p(t) , Np t=0,…,n 2n t
t=0
where the maximization is over p ∈ Pr with |p(t)| 􏰞 1 for t = 0,1,…,n. This
latter quantity is 1 advr (N − 1, D), by definition. N
23

Theorem 5.1 states that a smooth orthogonalizing distribution for D exists whenever low-degree polynomials have negligible advantage in computing D. Accordingly, we proceed to examine the advantage achievable by low-degree polynomials.
Lemma 5.2 (Each dense predicate induces a hard approximation problem).
Let D be an (n, B, 2d + 1)-dense predicate, where n, B, d are positive integers.
Assume that advr (N, D) 􏰟 n2−n/6, where r < deg(D) and N > 0 are given.
Then there are ⌊n ⌋d distinct reals {xij : i = 1,…,⌊n ⌋; j = 1,…,d} and a BB
polynomial p ∈ Pr such that:
xij ∈[(i−1)B,iB] |xij −xi′j′|􏰟1 |p(xij)| 􏰞 √n/N
p(x0) = 1
foralli,j,
forall(i,j)̸=(i′,j′),
for all i, j,
for some x0 ∈ [1n, 3n]. 44
Proof. Fix q ∈ Pr with |q(t)| 􏰞 1 for t = 0,1,…,n and adv(q,N,D) = advr(N, D). Fix k ∈ {0,1,…,n} with
􏰈n􏰉 k
D(k)
􏰌􏰈n􏰉 D(t)
􏰍
(−1)
Since deg(q) < deg(D), the quantity 􏰃n􏰄(−1)D(t)q(t) is positive for at most n t q(k) = max t=0,...,n t (−1) q(t) . values of t = 0,1,...,n. Therefore, adv(q,N,D)􏰞n· k (−1)D(k)q(k)􏰞n· k . 􏰃n􏰄 2n 2n 􏰃n􏰄 Recalling that adv(q, N, D) 􏰟 n2−n/6, we infer that 1n 􏰞 k 􏰞 3n. Put 44 p(t)= 1 q(t). q(k) Takingx0 =k,wehave1n􏰞x0 􏰞3nandp(x0)=1,asdesired.Itremainsto 44 find the points xi j . For this, we need the following claim. Claim 5.2.1. Let a,b be integers with a < b and D(a) ̸= D(b). Then |p(ξ)| 􏰞 √n/N for some ξ ∈ [a, b]. 24 Proof. If q vanishes at some point in [a, b], we are done. In the contrary case, q is nonzero and has the same sign at every point of [a, b], which means that either q(a)(−1)D(a) < 0 or q(b)(−1)D(b) < 0. Since adv(q, N, D) 􏰟 0, we have: min{|q(a)|, |q(b)|} 􏰞 􏰞 max t N t=0,...,n 2n √ n |q(k)|, N (−1)D(t)q(t) = N · k 2n · |q(k)| n􏰀􏰃n􏰄 􏱁n􏰃n􏰄 and hence min{|p(a)|,|p(b)|} 􏰞 √n/N. Fix an integer i = 1, 2, . . . , ⌊ n ⌋. Since D is (n, B, 2d + 1)-dense, D changes B value at least 2d times in [(i − 1)B + 1, i B]. As a result, there are at least d pairs of integers (a1,b1),...,(ad,bd) with D(a1) ̸= D(b1), D(a2) ̸= D(b2), ..., D(ad) ̸= D(bd) and (i−1)B+1􏰞a1 0 be a sufficiently small absolute constant. Let D be an (m, b⌈log n⌉, 1 b)-dense
predicate, where 1 n 􏰞 m 􏰞 n and b = ⌊αn/log2 n⌋. Then 350
􏰈n􏰉 U(D)􏰟􏱐 logn .
700
Proof. Throughout the proof we will, without mention, use the assumption that n is large enough. This will simplify the setting of parameters, the manipulation of floors and ceilings, and generally make the proof easier to follow.
Fix an integer v ∈ [1m, 1m] with b⌈logn⌉ | v. Clearly, v ≫ 3b⌈logn⌉. Define 84
D′ : {0,1,…,v} → {0,1} by D′(t) ≡ D(t). Since D′ is (v,b⌈logn⌉, 1 b)-
dense, Theorem 5.3 provides a distribution μ on {0, 1}v with
700
(6.1) (6.2)
(6.3)
(6.4)
(6.5)
μ(z) 􏰟 2−v 2−αn/350logn
􏰅 −7v/6 μ(z)χS (z)􏰦􏰅􏰅 􏰞 2
for each z ∈ {0,1}v, v
􏰅 D(|z|) 􏰅􏰅E 􏰥(−1)
for |S| < . 􏰅z 􏰅 6·1401⌈logn⌉ Define φ : {0, 1}v → R by φ(z) = (−1)D(|z|)μ(z). Restating (6.2), |φˆ(S)| 􏰞 2−7v/6 for |S| < v . 6 · 1401⌈log n⌉ Furthermore, Proposition 2.2 reveals that max|φˆ(S)| 􏰞 2−v. S⊆[v] Let A be the (2v, v, 8−v φ)-pattern matrix. By (6.3), (6.4), and Theorem 2.8, ∥A∥ 􏰞 4−v 2−v/12·1401⌈logn⌉. 26 By (6.1), every entry of A has absolute value at least 16−v 2−αn /350 log n . Combining this observation with (6.5) and Theorem 2.5, rk±(A) 􏰟 2v/12·1401⌈logn⌉ 2−αn/350logn. Recall that v 􏰟 1 m 􏰟 1 n. Hence, for a suitably small constant α > 0,
8 8·350
rk±(A) 􏰟 2􏱐(n/ log n).
It remains to relate the sign-rank of A to the communication complexity of D. Let F be the (2v, v, f )-pattern matrix, where f (z) = (−1)D(|z|). Then rk±(A) = rk±(F) because A and F have the same sign pattern. But F is a submatrix of the communication matrix of D, namely,
Corollary 6.2 (Communication complexity of high-degree predicates). Let D: {0,1,…,n} → {0,1} be a predicate with deg(D) 􏰟 1n. Then
4
􏰈n􏰉 U(D)􏰟􏱐 log4n .
Proof. Immediate from Lemma 3.5 and Theorem 6.1.
At last, we arrive at the main result of this paper, cf. Theorem 1.1 in the
Introduction.
Theorem 6.3 (Main Result). Let D: {0,1,…,n} → {0,1} be a nonconstant predicate, k = deg(D). Then
􏰈 k 􏰉 􏰈 2n􏰉 􏱐 {1+log(n/k)}log4n 􏰞U(D)􏰞O klog k .
27
M = 􏰙(−1)D(|x∧y|)􏰚
rk±(M) 􏰟 rk±(F) = rk±(A) 􏰟 2􏱐(n/ log n).
Thus,
In view of Theorem 2.4, the proof is complete.
x∈{0,1}m,y∈{0,1}m
.

Proof. The lower bound is immediate from Lemma 3.2 and Corollary 6.2. To prove the upper bound, fix p ∈ Pk with sgn(p(t)) = (−1)D(t) for t = 0,1,…,n.
Put
wheretheindicesrunasusual:x,y∈{0,1}n.ThenMxyRxy >0forallxandy.
M =􏰙(−1)D(|x∧y|)􏰚 , R =􏰙p(x1y1 +···+xnyn)􏰚 , x,y x,y
Thus, the sign-rank of M does not exceed 􏰂k 􏰃n􏰄. In view of Theorem 2.4, this i=0 i
completes the proof.
Remark 6.4. Immediate consequences of Theorem 6.3 are near-tight lower bounds on the size of threshold-of-majority and majority-of-threshold circuits for every function f(x,y) = D(|x∧y|), where D: {0,1,…,n} → {0,1} is a given predicate. Similarly, Theorem 6.3 yields near-tight lower bounds on the dimension complexity of every concept class CD = {fD,y : y ∈ {0,1}n}, where fD,y(x) = D(|x ∧ y|) for a fixed predicate D : {0, 1, . . . , n} → {0, 1}. These applications follow from well-known black-box arguments involving sign-rank [8, Lem. 5], and we do not spell them out here.
On Logarithmic Factors in Theorem 6.3
It is natural to wonder whether the logarithmic factors in Theorem 6.3 can be eliminated. The answer varies from one predicate to another. There are indeed predicates D: {0,1,…,n} → {0,1} for which U(D) = 􏱍(deg(D)). For exam- ple, the conjunction predicate, given by ANDn(t) = 1 ⇔ t = n, has degree 1 and unbounded-error complexity 􏱍(1), as one can verify from the representation ANDn(|x ∧ y|) = 􏰹xi · 􏰹yi. Similarly, the familiar predicate PARITYn(t) = t mod 2 has degree n and unbounded-error complexity 􏱍(n) by Forster’s result [7]. At the same time, there are predicates D for which a logarithmic gap exists between deg(D) and U(D). One such predicate is disjointness, given by DISJn(t) = 1 ⇔ t = 0, which has degree 1 and unbounded-error complexity 􏱍(log n):
Proposition 6.5. U(DISJn) = 􏱍(logn).
Proof. The upper bound is immediate from Theorem 6.3. For the lower bound,
note that
n 4n
􏱭xi ∧yi =􏰘 fi(x1,…,xn)∧gi(y1,…,yn),
i=1 i=1
where fi , gi are suitable Boolean functions (in fact, conjunctions of literals). This yields the inequality U(PARITYn) 􏰞 U(DISJ4n ), which completes the proof since U(PARITYn) = 􏱍(n) by Forster’s result [7].
28

The lower bound of Proposition 6.5 is of course valid for any predicate D that contains disjointness or its negation as a subfunction. More precisely:
Proposition 6.6. Let D: {0,1,…,n} → {0,1} be a predicate with flip vector v. If v contains the subvector (1,0,0,…,0), then U(D) 􏰟 􏱐(logm).
􏱄 􏱅􏱆 􏱇
m
To illustrate, Proposition 6.6 shows that the majority predicate MAJn(t) = 1 ⇔ t > n/2 has degree 1 and unbounded-error complexity 􏱍(log n). Other threshold predicates can be handled analogously.
Acknowledgments
I would like to thank Anna Ga ́l, Adam Klivans, and the anonymous reviewers for their useful comments on an earlier version of this manuscript. This research was supported by Adam Klivans’ NSF CAREER Award and NSF Grant CCF-0728536.
References
[1] N. Alon, P. Frankl, and V. Ro ̈dl. Geometrical realization of set systems and proba- bilistic communication complexity. In Proc. of the 26th Symposium on Foundations of Computer Science (FOCS), pages 277–280, 1985.
[2] J. Aspnes, R. Beigel, M. L. Furst, and S. Rudich. The expressive power of voting polynomials. Combinatorica, 14(2):135–148, 1994.
[3] L. Babai, P. Frankl, and J. Simon. Complexity classes in communication complexity theory. In Proc. of the 27th Symposium on Foundations of Computer Science (FOCS), pages 337–347, 1986.
[4] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702–732, 2004.
[5] H. Buhrman, N. K. Vereshchagin, and R. de Wolf. On computation and communica- tion with small bias. In Proc. of the 22nd Conf. on Computational Complexity (CCC), pages 24–32, 2007.
[6] V. Feldman, P. Gopalan, S. Khot, and A. K. Ponnuswami. New results for learning noisy parities and halfspaces. In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS), pages 563–574, 2006.
[7] J.Forster.Alinearlowerboundontheunboundederrorprobabilisticcommunication complexity. J. Comput. Syst. Sci., 65(4):612–625, 2002.
29

[8] J. Forster, M. Krause, S. V. Lokam, R. Mubarakzjanov, N. Schmitt, and H.-U. Simon. Relations between communication complexity, linear arrangements, and computational complexity. In Proc. of the 21st Conf. on Foundations of Software Technology and Theoretical Computer Science (FST TCS), pages 171–182, 2001.
[9] J. Forster and H. U. Simon. On the smallest possible dimension and the largest possible margin of linear arrangements representing given concept classes. Theor. Comput. Sci., 350(1):40–48, 2006.
[10] M. Goldmann, J. Ha ̊stad, and A. A. Razborov. Majority gates vs. general weighted threshold gates. Computational Complexity, 2:277–300, 1992.
[11] A. Hajnal, W. Maass, P. Pudla ́k, M. Szegedy, and G. Tura ́n. Threshold circuits of bounded depth. J. Comput. Syst. Sci., 46(2):129–154, 1993.
[12] S.Jukna.ExtremalCombinatoricswithApplicationsinComputerScience.Springer- Verlag, Berlin, 2001.
[13] B. Kalyanasundaram and G. Schnitger. The probabilistic communication complexity of set intersection. SIAM J. Discrete Math., 5(4):545–557, 1992.
[14] B. S. Kashin and A. A. Razborov. Improved lower bounds on the rigidity of Hadamard matrices. Matematicheskie zametki, 63(4):535–540, 1998. In Russian.
[15] M. Kearns and L. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. J. ACM, 41(1):67–95, 1994.
[16] M. Kharitonov. Cryptographic hardness of distribution-specific learning. In Proc. of the 25th Symposium on Theory of Computing, pages 372–381, 1993.
[17] H. Klauck, R. Sˇpalek, and R. de Wolf. Quantum and classical strong direct product theorems and optimal time-space tradeoffs. SIAM J. Comput., 36(5):1472–1493, 2007.
[18] A. R. Klivans and R. A. Servedio. Learning DNF in time 2O ̃ (n1/3). J. Comput. Syst. Sci., 68(2):303–318, 2004.
[19] A. R. Klivans and A. A. Sherstov. A lower bound for agnostically learning disjunc- tions. In Proc. of the 20th Conf. on Learning Theory (COLT), pages 409–423, 2007.
[20] A. R. Klivans and A. A. Sherstov. Unconditional lower bounds for learning intersec- tions of halfspaces. Machine Learning, 69(2–3):97–114, 2007.
[21] A. R. Klivans and A. A. Sherstov. Cryptographic hardness for learning intersections of halfspaces. J. Comput. Syst. Sci., 75(1):2–12, 2009.
[22] E. Kushilevitz and N. Nisan. Communication complexity. Cambridge University Press, New York, 1997.
[23] S. V. Lokam. Spectral methods for matrix rigidity with applications to size-depth trade-offs and communication complexity. J. Comput. Syst. Sci., 63(3):449–473, 2001.
30

[24] M. L. Minsky and S. A. Papert. Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, Mass., 1969.
[25] I. Newman. Private vs. common random bits in communication complexity. Inf. Process. Lett., 39(2):67–71, 1991.
[26] N.Nisan.Thecommunicationcomplexityofthresholdgates.InCombinatorics,Paul Erdo ̋s is Eighty, pages 301–315, 1993.
[27] R. Paturi. On the degree of polynomials that approximate symmetric Boolean functions. In Proc. of the 24th Symposium on Theory of Computing (STOC), pages 468–474, 1992.
[28] R. Paturi and J. Simon. Probabilistic communication complexity. J. Comput. Syst. Sci., 33(1):106–123, 1986.
[29] R. Raz. Fourier analysis for probabilistic communication complexity. Comput. Complex., 5(3/4):205–221, 1995.
[30] A. A. Razborov. Bounded-depth formulae over the basis {&, ⊕} and some combina- torial problems. Complexity Theory and Applied Mathematical Logic, vol. “Problems of Cybernetics”:146–166, 1988. In Russian.
[31] A. A. Razborov. On the distributional complexity of disjointness. Theor. Comput. Sci., 106(2):385–390, 1992.
[32] A. A. Razborov. Quantum communication complexity of symmetric predicates. Izvestiya: Mathematics, 67(1):145–159, 2003.
[33] A. A. Razborov and A. A. Sherstov. The sign-rank of AC0. In Proc. of the 49th Symposium on Foundations of Computer Science (FOCS), pages 57–66, 2008.
[34] T. J. Rivlin. An Introduction to the Approximation of Functions. Dover Publications, New York, 1981.
[35] A. A. Sherstov. Powering requires threshold depth 3. Inf. Process. Lett., 102(2– 3):104–107, 2007.
[36] A. A. Sherstov. Halfspace matrices. Comput. Complex., 17(2):149–178, 2008. Preliminary version in 22nd CCC, 2007.
[37] A. A. Sherstov. The pattern matrix method. SIAM J. Comput., 2010. To appear. Preliminary version in 40th STOC, 2008.
[38] A. A. Sherstov. Separating AC0 from depth-2 majority circuits. SIAM J. Comput., 38(6):2113–2129, 2009. Preliminary version in 39th STOC, 2007.
[39] Y. Shi and Y. Zhu. Quantum communication complexity of block-composed func- tions. Quantum Information & Computation, 9(5–6):444–460, 2009.
[40] L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984.
[41] R. de Wolf. Quantum Computing and Communication Complexity. PhD thesis,
University of Amsterdam, 2001.
31