CS代考计算机代写 AI decision tree discrete mathematics information theory algorithm ER ant scheme Foundations and Trends⃝R in Theoretical Computer Science Vol. 4, Nos. 1–2 (2008) 1–155 ⃝c 2009 S. V. Lokam

Foundations and Trends⃝R in Theoretical Computer Science Vol. 4, Nos. 1–2 (2008) 1–155 ⃝c 2009 S. V. Lokam
DOI: 10.1561/0400000011
Complexity Lower Bounds using Linear Algebra
By Satyanarayana V. Lokam
Contents
1 Introduction 2
1.1 Scope 2
1.2 Matrix Rigidity 3
1.3 Spectral Techniques 4
1.4 Sign-Rank 5
1.5 Communication Complexity 6
1.6 Graph Complexity 8
1.7 Span Programs 9
2 Matrix Rigidity 11
2.1 Motivation 11
2.2 Explicit Lower Bounds 15
2.3 Sparse Factorizations and Constant Depth Circuits 23
2.4 Rigidity Lower Bounds Using Algebraic Dimensions 24
2.5 Quadratic Lower Bounds on Linear Circuits 33
2.6 Paturi–Pudla ́k Dimensions 35
3 Spectral Methods to Study
Rank Robustness 40
3.1 Singular Value Decomposition 41

3.2 Variants of Rigidity 43
3.3 Bounded Coefficient Complexity of Linear
Transformations 51
3.4 Bounded Coefficient Complexity of some Bilinear
Transformations 56
4 Sign-Rank and Other Complexity Measures
of Sign Matrices 65
4.1 Sign-Rank 65
4.2 Forster’s Lower Bound 67
4.3 Margin Complexity, Factorization Norms,
and Discrepancy 72
4.4 Sign-Rank of an AC0 function 79
4.5 Applications of Complexity Measures
of Sign Matrices 85
5 Rank Robustness and
Two-Party Communication Complexity 94
5.1 The Log-Rank Conjecture in Communication
Complexity 94
5.2 Discrepancy and Randomized
and Quantum Communication Complexity 100
5.3 Relating Probabilistic Communication Complexity
to Sign-Rank and Margin Complexity 103
5.4 Matrix Rigidity and Quantifier Alternation
in Communication Complexity 107
6 Graph Complexity and Projective and Affine
Dimensions of Graphs 113
6.1 Graph Complexity 114
6.2 Pro jective and Affine Dimensions of Graphs 117
6.3 Lower Bounds on Depth-3 Graph Complexity 125

7
7.1 7.2
8
Span Programs: A Linear Algebraic Model
of Computation 132
Characterizing Span Program Size 133 Explicit Lower Bounds on Monotone Span
Program Size 136
Conclusions and Open Problems 141
Acknowledgments 148 References 149

Foundations and Trends⃝R in Theoretical Computer Science Vol. 4, Nos. 1–2 (2008) 1–155 ⃝c 2009 S. V. Lokam
DOI: 10.1561/0400000011
Complexity Lower Bounds using Linear Algebra
Satyanarayana V. Lokam
Microsoft Research India, Bangalore – 560080, India, satya@microsoft.com
Abstract
We survey several techniques for proving lower bounds in Boolean, algebraic, and communication complexity based on certain linear alge- braic approaches. The common theme among these approaches is to study robustness measures of matrix rank that capture the complex- ity in a given model. Suitably strong lower bounds on such robustness functions of explicit matrices lead to important consequences in the corresponding circuit or communication models. Many of the linear algebraic problems arising from these approaches are independently interesting mathematical challenges.

1
Introduction
1.1 Scope
Understanding the inherent computational complexity of problems is of fundamental importance in mathematics and theoretical com- puter science. While rapid progress has been made on upper bounds (algorithms), progress on lower bounds on the complexity of explicit problems has remained slow despite intense efforts over several decades. As is natural with typical impossibility results, lower bound questions are hard mathematical problems and hence are unlikely to be resolved by ad hoc attacks. Instead, techniques based on mathematical notions that capture computational complexity are necessary.
This paper surveys some approaches based on linear algebra to proving lower bounds on computational complexity. Linear algebraic methods are extensively used in the study of algorithms and complexity. Our focus here is on their applications to lower bounds in various models of circuits and communication complexity. We further consider mainly classical — as opposed to quantum — models of computation. Linear algebra plays an obviously pervasive role in the study of quan- tum complexity. Indeed, some of the techniques studied in this paper
2

1.2 Matrix Rigidity 3
have natural extensions to quantum models. However, to keep the scope of this survey narrow enough to limit its length, we restrict ourselves to classical models. Even within classical complexity theory, we do not touch upon several applications where linear algebra plays a critical role, most notably techniques related to spectra of graphs and coding theory. Our choice of topics is centered around the theme of deriving complexity lower bounds from lower bounds on ranks of matrices or dimensions of subspaces — often after the matrices or the subspaces are altered in various ways. Such a theme occurs in several contexts in complexity theory. The rough overall approach in this theme consists of (i) distilling a rank robustness or a dimension criterion to solve a lower bound problem in complexity, (ii) developing techniques to solve such linear algebraic problems, and (iii) exploring the consequent implica- tions to complexity lower bounds.
In the remaining sub-sections of this section, we give a brief intro- duction and preview of the material to be presented in detail in the later sections.
1.2 Matrix Rigidity
The most classical notion of rank robustness is matrix rigidity. The rigidity of a matrix A for target rank r is the minimum Hamming distance between A and a matrix of rank at most r. Valiant [98] introduced this notion to define a criterion for proving superlinear size lower bounds on linear circuits of logarithmic depth. Linear circuits are algebraic circuits consisting only of gates that compute linear combi- nations of their inputs. This is a natural model for computing linear transformations. Given the ubiquitous role linear transformations play in computing, understanding the inherent complexity of explicit linear transformations is important. For example, a fascinating open ques- tion is whether the Fourier transform requires a superlinear number of arithmetic operations. Furthermore, no superlinear lower bounds are known on the algebraic circuit complexity of any explicit function of constant degree, even when the circuit depth is restricted to be loga- rithmic. Thus, a superlinear lower bound on the size of a log-depth linear circuit computing an explicit linear transformation would be

4 Introduction
significant. Valiant showed that if the rigidity of an n × n matrix A
for target rank εn is at least n1+δ for positive constants ε and δ,
then a linear circuit of log-depth computing the linear transforma-
tion x 􏰶→ Ax must have superlinear number of edges. Hence, proving
sufficiently strong lower bounds on the rigidity of explicit matrices
would yield important consequences in complexity theory. However,
the best known lower bound on the rigidity of an explicit matrix is
only Ω􏰃n2 log n􏰄 [31, 56, 93] for target rank r. This bound is known rr
for several explicit matrices, including the Fourier transform matrix of a prime order. Using certain algebraic dimension arguments, rigidity lower bounds of Ω(n2) (for target rank r = εn for a constant ε > 0) are proved in [59] for the matrices whose entries are square roots of distinct primes and for matrices whose entries are primitive roots of unity of the first n2 prime orders. While these matrices are mathematically suc- cinct enough to describe, they are not explicit enough since their entries live in number fields of exponentially large dimensions. In Section 2, we study the notion of rigidity and its application to lower bounds on linear circuits. We will give several lower bound proofs on the rigidity of various matrices and the implied circuit lower bounds. We will also review two notions of a geometric flavor [70] that are similar to rigidity and have applications to circuit lower bounds.
1.3 Spectral Techniques
Several rank robustness functions similar to rigidity have been defined in the literature and applied to derive lower bounds in complexity the- ory. In Section 3, we discuss several such variations. The simplest of them considers the l2-norm of changes needed to reduce the rank of a given matrix to a target rank (notions considered in Section 3 are defined over R or C). This measure of rank robustness is effectively related to the singular values of the matrix and hence lower bounds are easily proved using spectral techniques [57]. Using spectral tech- niques, we can also prove lower bounds on the rigidity (in the sense of Valiant) of certain matrices. The most notable of these is an Hadamard matrix [26, 42], for which a lower bound of Ω(n2/r) is known. Spectrum of a matrix is also related to values of its sub-determinants (volumes).

1.4 Sign-Rank 5
Lower bounds on these measures imply lower bounds on linear circuits (over C) with bounded coefficients, i.e., the coefficients in the linear combinations computed by the gates in such a circuit are bounded by a constant. Algebraic circuits over C with bounded coefficients is a realistic restricted model of computation since real computers can only use arithmetic operations with bounded precision in a given step. Sev- eral lower bound results have been proved in the models of bounded coefficients [22, 24, 57, 69, 75, 83]. Indeed, a classical result in [65] gives an Ω(nlogn) lower bound on the size of a linear circuit with bounded coefficients (with no restriction on depth) computing the Fourier transform. In a more recent development, Raz [83] proved a remarkable lower bound of Ω(n2 log n) on n × n matrix multiplication in the model of bilinear circuits with bounded coefficients. Raz defines a geometric variant of l2-rigidity and uses spectral techniques to prove lower bounds on the linear circuits obtained when one of the matrices in the input to a bilinear circuit performing matrix multiplication is fixed. Subsequently, Bu ̈rgisser and Lotz [22] proved lower bounds on several bilinear transformations using spectral and volume techniques. We will describe these results as well in Section 3.
1.4 Sign-Rank
In Section 4, we study a rank robustness notion called the sign-rank of a matrix with ±1 entries. The sign-rank of a matrix A is the minimum rank of a real matrix each of whose entries agrees in sign with the corre- sponding entry of A. In other words, by making sign-preserving changes to A (changes are allowed to all entries of A), its rank cannot be brought down below its sign-rank. This notion was first introduced by Paturi and Simon [71] in the context of unbounded error probabilistic com- munication complexity. Proving nontrivial, i.e., superlogarithmic, lower bounds on sign ranks of explicit matrices remained a long-standing open question until Forster [28] achieved a breakthrough by proving that the sign-rank of an n × n Hadamard matrix is at least √n. Interestingly, Forster’s result and subsequent techniques for proving lower bounds on sign-rank rely on spectral techniques. Forster also considers the notion of margin complexity from learning theory and uses the same

6 Introduction
techniques to prove lower bounds on margin complexity. Recent results
in [53, 54, 55], give new insights into sign-rank, margin complexity, and
discrepancy of sign matrices by studying them all in the framework of
factorization norms of operators. This general approach reveals several
connections among the various complexity measures of sign matrices
and led to exciting new techniques to prove lower bounds on them. In
particular, they show that the discrepancy and the margin of a sign
matrix are within a constant factor of each other. Lower bounds on
sign-rank, margin complexity, and discrepancy are immensely useful in
proving lower bounds in a variety of models such as communication
complexity, circuit complexity, and learning theory. We will discuss
several such applications in Section 4. Very recently, Razborov and
Sherstov [88] proved a very interesting lower bound on the sign-rank
of a matrix constructed from a Boolean function in AC0. As an imme-
diate consequence, they show that Σcc (the communication complexity 2
analog of the complexity class Σ2) is not contained in the communica- tion complexity class UPP defined by [71]. This solves a long-standing open question [5] in two-party communication complexity. The sign- rank lower bound of [88] also has interesting consequences to lower bounds on circuit complexity and learning theory. Their result combines Forster’s main argument with a number of novel techniques including the use of the pattern matrix method [90]. These techniques usher in exciting new developments and are likely to find more applications.
1.5 Communication Complexity
Ever since Mehlhorn and Schmidt [63] proved the fundamental result that the log-rank of a 0–1 matrix is a lower bound on the two-party communication complexity of the associated Boolean function, the relation between rank, and more generally rank robustness, and com- munication complexity has been widely investigated and exploited. Yet, the most basic question of whether log-rank and communication com- plexity are polynomially related to each other is still open (this is also known as the log-rank conjecture). In this conjecture, rank over R is considered. We begin Section 5 by discussing what is known about this conjecture. Nisan and Wigderson [68] exhibit a Boolean matrix with

1.5 Communication Complexity 7
communication complexity at least (log-rank)α, where α ≈ log3 6. They
also show that, to prove that the communication complexity of every
{0,1}-matrix is bounded above by some polynomial function of log-rank
of the matrix, it suffices to show that every {0,1}-matrix of rank r
contains a sufficiently large submatrix of rank at most, say r/4. On
the other hand, Nisan and Wigderson [68] succeed in showing that
low rank matrices have high discrepancy (under the uniform distribu-
tion) using spectral arguments. Proving upper bounds on discrepancy
is a common and very useful method to prove lower bounds on prob-
abilistic communication complexity. In the result mentioned earlier,
Linial et al. [53] show that discrepancy (under any distribution) is at
least Ω(r−1/2) for any rank-r {0,1}-matrix. The proof of this bound
uses connections among discrepancy, rank, and factorization norms
of matrices discussed in Section 4. Strengthening these connections,
Linial and Shraibman [54] prove general lower bounds on the bounded
error probabilistic and quantum communication complexity of a sign
matrix in terms of a factorization norm, called the γ2α-norm, of the
matrix. As we noted before, the log-sign-rank of a matrix is essentially
equal to the unbounded error communication complexity of the matrix.
We will also see that the communication complexity analog of PP is
characterized by margin complexity. Thus rank robustness measures
such as sign-rank and γ2-norm of sign matrices yield lower bounds,
sometimes even characterizations, of probabilistic communication com-
plexity. Babai et al. [5] defined two-party communication complexity
analogs of traditional complexity classes such as Σk, PH, PSPACE, etc.
While analogs of low complexity classes such as P, NP, Co–NP, and
BPP were all separated from each other in two-party communication
complexity, questions such as PH versus PSPACE, Σ2 vs. PH are still
open. In [84] and [57], it was shown that sufficiently strong lower bounds
on rigidity (over finite fields) and a variant of rigidity (over reals) with
bounds on changes would separate PH and PSPACE in communica-
tion complexity. As mentioned before, a recent result in [88] separates
Σcc from UPP by proving a strong lower bound on the sign-rank of an 2
AC0-function. We conclude that lower bounds on rank and rank robust- ness have significant consequences to various lower bound questions in two-party communication complexity.

8 Introduction
1.6 Graph Complexity
Graph complexity was introduced by Pudla ́k et al. [79]. In this model, a graph — typically bipartite — on a vertex set V is constructed by starting from a family of basic graphs, e.g., complete bipartite graphs, on V and using the elementary operations of union and intersection on edge sets. The model of graph complexity is a common generalization of Boolean circuit and two-party communication complexity. In par- ticular, proving sufficiently strong lower bounds on the complexity of explicit bipartite graphs would imply lower bounds on formula size com- plexity, branching program size, and two-party communication com- plexity of Boolean functions. Naturally, proving lower bounds in models of graph complexity is even harder than proving lower bounds in cir- cuit and communication complexity models. However, graph–theoretic formulations of lower bound questions have the potential to lead to new insights. In particular, such formulations involving linear algebraic representations of graphs have led to new approaches to proving lower bounds on branching program size, formula size, and separation ques- tions in two-party communication complexity. In Section 6, we review such approaches. A linear algebraic representation of a graph places a vector, or more generally a subspace, at each vertex of the graph such that the adjacency relations among vertices can be expressed, or implied, in terms of orthogonality or intersection properties of the associated subspaces. The lowest dimension in which such a represen- tation can be realized for a given graph is the parameter of interest. Such representations have been extensively studied, e.g., in the context of the Shannon capacity of a graph [61], Colin de Verdi`ere’s invari- ant of graphs [45], etc. These and many similar combinatorial-linear algebraic problems are not only of inherent mathematical interest, but also have found numerous applications in algorithms and complexity. In Section 6, we define the affine and projective representations of graphs and pose questions about the lowest dimensions in which explicit graphs can be realized by such representations. Unfortunately, only weak lower bounds — Ω(log n) for n × n explicit bipartite graphs — are known on these dimensions. Lower bounds exceeding Ω(log3 n) on the affine dimension of explicit graphs are needed to derive new lower bounds on

1.7 Span Programs 9
the formula size of explicit Boolean functions. Pudla ́k and Ro ̈dl [76] showed that a lower bound on the projective dimension of a bipar- tite graph implies a lower bound on the branching program size of the associated Boolean function. In relating formula size of bipartite graphs (thereby deriving lower bounds on the associated Boolean func- tions) to affine dimension, Razborov [85] developed an elegant approach based on rectangle covers of matrices closely related to communication complexity. A rank robustness parameter (given a partially specified matrix, what is the minimum rank of a full matrix that completes it) plays a central role in establishing this connection. This same parameter and the underlying techniques are also used in characterizing the size of span programs in Section 7. Nontrivial lower bounds are known on graph complexity when we restrict the model to be of depth-3 graph for- mulas. In this case, building on polynomial approximations of the OR function and Forster’s lower bound on the sign-rank of an Hadamard matrix,weshow[58]Ω ̃(log3n)lowerboundsonthedepth-3complexity of explicit bipartite graphs.
1.7 Span Programs
Karchmer and Wigderson [41] introduced a linear algebraic model of computation called span programs. A span program associates a sub- space with each of the 2n literals of an n variable Boolean function. The result of its computation on a given input x is 1 if and only a fixed nonzero vector, e.g., the all 1’s vector, is in the span of the sub- spaces “activated” by x. The sum of the dimensions of the subspaces is the size of the span program. Proving lower bounds on span program size of explicit Boolean functions is a challenging research direction since such results imply lower bounds on Boolean formulas, symmet- ric branching programs, algebraic proof systems, and secret sharing schemes. The model of span programs realizes the fusion method for proving circuit lower bounds [103]. Currently, superpolynomial lower bounds are known only on monotone span programs. Monotone span programs are more powerful than monotone Boolean circuits [6]. Hence, proving lower bounds on monotone span programs is more challenging. Early results in this area include a combinatorial criterion on certain

10 Introduction
bipartite graphs that led to Ω(n5/2) monotone size lower bounds [9]. Subsequently, Babai et al. [6] proved the first superpolynomial mono- tone lower bound of nΩ(log n/ log log n) exploiting the criterion from [9] but using Paley-type graphs. The most striking result to date on span program size is by Ga ́l [32] who proves a characterization of span pro- gram size in terms of a rank robustness measure originally introduced by Razborov [85] and referred to above in Section 1.6 and discussed in Section 6. Specializing this characterization to the monotone situa- tion and using previously known lower bounds on the rank robustness measure of certain matrices derived from Paley-type bipartite graphs [85], Ga ́l proved the best known lower bound of nΩ(logn) on monotone span programs. We discuss Ga ́l’s characterization and her lower bound in Section 7.

2.1 Motivation
2
Matrix Rigidity
Matrix rigidity was introduced by Valiant [98]; a similar notion was used, independently, by Grigoriev [34].
Definition 2.1. The rigidity of a matrix A over a field F, denoted RFA(r), is the least number of entries of A that must be changed in order to reduce the rank of A to a value at most r:
RFA(r) := min{|C| : rankF(A + C) ≤ r},
where |C| denotes the number of nonzero entries of C.
When the field of definition is clear from the context, we often omit
the superscript F.
Intuitively, the rigidity of A for target rank r is the Hamming distance between A and the set of all matrices of rank at most r.
It is easy to see that for any n × n matrix A over any field, RFA(r) ≤ (n − r)2. Valiant [98] also showed that for “almost all” n × n matri- ces A, RFA(r) = (n − r)2 if F is infinite and RFA(r) = Ω((n − r)2/logn)
11

12 Matrix Rigidity
if F is finite. What do we mean by “almost all” matrices over an infi- nite field? It is a Zariski open set, i.e., the complement of the solution set of a finite system of polynomial equations. Over a finite field it is interpreted in the usual counting sense, i.e., the fraction of matrices in the set goes to 1 as n→∞.
The main challenge about rigidity is to prove nontrivial lower bounds for explicit matrices. Specifically, Valiant posed the following question:
Open Question 2.1. Find an explicit infinite sequence of matrices {An} such that RFAn (εn) ≥ n1+δ, where An ∈ Fn×n for all n in an infi- nite subset of N and ε, δ > 0 are constants.
A digression on explicitness: The challenge of constructing “explicit” objects with “nice” properties is a familiar one in several fields such as combinatorics, computer science, and coding theory. Examples include expander graphs, hard Boolean functions, and good error- correcting codes. Often a random object is easily shown to be nice and the challenge is to give explicit constructions of objects that are at least approximately nice. Informally, explicitness refers to an algorithmic description of an infinite sequence of nice objects of increasing size. In our context, we may define {An} to be explicit if for each n in the index set, given 1 ≤ i,j ≤ n, there is a Boolean circuit of size polynomial in1 n to construct An(i,j). If we want uniform constructibility, we may also insist that testing membership of a natural number in the index set and a construction of the relevant circuit itself be efficiently doable. We also remark that sometimes we say an explicit object (as in a hard function, a good code or a rigid matrix) when we really mean an infinite family of them; the usage refers to a “generic” object from an infinite sequence parameterized by “size” n. Also by abuse of notation, we sometimes abbreviate {An} by A. End of digression.
Why should we care about rigid matrices? Strong lower bounds on the rigidity of explicit matrices will lead to significant breakthroughs in complexity theory. In particular, Valiant proved that if RFA(εn) ≥ n1+δ, then the linear transformation defined by A cannot be computed by arithmetic circuits of O(n) size and O(logn) depth in which each gate computes a linear combination of its inputs. Computing linear
1 A stronger definition would be to require a circuit of size polylog(n). But poly(n) size explicitness is already a challenge.

2.1 Motivation 13
transformations is a basic computational task. Many fundamental prob- lems such as computing the Fourier transform, polynomial evaluation and interpolation, computing prefix sums and encoding/decoding linear error-correcting codes involve computation of linear transformations. Computation of bilinear transformations, such as matrix multiplica- tion, is closely related to the computation of linear transformations [8]. Proving a superlinear lower bound in a general model on any explicit linear transformation will thus be a big step forward in algebraic com- plexity. For example, a proof that RFA(εn) ≥ n1+δ, where A is the Fourier transform matrix (A = (ζij)0≤i,j≤n−1, where ζ is a primitive nth root of unity) would show that there can be no “ultra-fast Fourier transform” algorithm running in O(n) time. Recall that the current best algorithm — the Fast Fourier Transform (FFT) algorithm — uses linear circuits of size O(nlogn) and depth O(logn).
We now describe the model of linear circuits and Valiant’s result relating rigidity to linear circuit complexity.
Definition 2.2. A linear circuit over a field F is a directed acyclic graph L in which each directed edge is labeled by a nonzero element of F. If g is a gate with in-coming edges labeled by λ1,…,λk from gates g1,…,gk, then g computes v(g) := λ1v(g1) + ··· + λkv(gk), where v(gi) ∈ F is the value computed at gate gi.
Suppose L has n input gates (nodes with no in-coming edges) and m output gates (including nodes with no out-going edges). If we denote by y1,…,ym ∈ F the values computed at the output gates of L starting with the values x1,…,xn ∈ F at the input gates, then we will have y = ALx, where AL ∈ Fm×n; in other words, the circuit L computes the linear transformation given by the matrix AL.
The size of a linear circuit L is defined to be the number of edges in L. The depth of L is defined to be the length of a longest path from an input node to an output node in L. When depth of L is Ω(logn), we assume that each of its internal nodes (gates) has in-degree (fan-in) exactly 2; otherwise, the in-degree is at least 2.
The model of linear circuits is a natural model of computation for computing linear transformations. Furthermore, at the expense

14 Matrix Rigidity
of constant factors in size complexity, any arithmetic circuit com- puting a linear transformation over C can be turned into a linear circuit [20, Theorem 13.1]. It is trivial to see that any linear trans- formation Fn → Fn can be computed by a linear circuit of size O(n2) and depth O(logn). It is a major challenge in complexity theory to prove superlinear lower bounds on the size of linear circuits, even of logarithmic depth, computing explicit linear transformations. In his seminal result [98], Valiant proved a criterion for such a complexity lower bound in terms of matrix rigidity.
Theorem 2.1. Suppose the linear transformation x 􏰶→ Ax is computed by a linear circuit of size s and depth d in which each gate has fan-in two. Then, for any t > 1,
􏰈slogt􏰉
RA logd ≤ 2O(d/t) · n. (2.1)
In particular, if RA(εn) ≥ n1+δ for some constants ε,δ > 0, then any linear circuit of logarithmic depth computing x 􏰶→ Ax must have size Ω(n log log n).
Proof. Proof of this theorem depends on a classical graph–theoretic lemma [27].
Lemma 2.2. Let G = (V,E) be a directed acyclic graph in which all (directed) paths are of length at most d. Then by removing at most |E|/logd edges, we can ensure that all paths in the remaining graph have length at most d/2.
For simplicity, let us assume that d and t are powers of 2. By repeatedly applying the above lemma logt times, we can identify a set T of edges from the circuit graph G such that (i) |T|≤slogt/logd and (ii) after removing T from the circuit, all paths in the remaining circuit graph G′ have length at most d/t. Let V ′ be the “tails” of the edges in T and let b1,…,b|V ′| be the linear forms (in the inputs xi) computed at the gates in V′. In the left-over graph G′, any given output node is

2.2 Explicit Lower Bounds 15
reachable from at most 2d/t input nodes since in G′ all paths are of length at most d/t.
A linear form ai computed at an output node i is now a linear combination of the removed “intermediate” nodes bk, 1 ≤ k ≤ |V ′|, and at most 2d/t variables. Identifying a linear form with the corresponding (row) vector in Fn, we see
|V′| 􏰕
ai =
where βik ∈ F and ci ∈ Fn has at most 2d/t nonzero entries. In matrix
notation,
A=B1B2 +C=B+C, (2.2)
where B1 =(βik)∈Fn×|V′| and B2 has bk, 1≤k≤|V′|, as its rows. Clearly, rank(B) ≤ |V ′| ≤ |T| ≤ slogt/logd and each row of C has at most 2d/t entries and hence |C| ≤ n2d/t. It follows that by changing at most n2d/t entries, the rank of A can be reduced to a value at most slogt/logd. This proves the theorem.
In addition to the original motivation above, rigidity and many similar matrix functions that measure the “robustness” of the rank function have been discovered to have applications in several other models of computation such as communication complexity, branching programs, span programs, and threshold circuits. Since rigidity-like functions appear to be so fundamental, several researchers [28, 29, 31, 56, 58, 74, 77, 80, 81, 84, 93] studied this topic and explored its connections to other problems in complexity theory, algebra, and com- binatorics. By now, many such connections have been discovered.
2.2 Explicit Lower Bounds
Many candidate matrices are conjectured to have rigidity as high as in Valiant’s question. Examples include Fourier transform matri- ces, Hadamard matrices, Cauchy matrices, Vandermonde matrices, incidence matrices of projective planes, etc. Despite intense efforts by
k=1
βikbk +ci, 1≤i≤n,

16 Matrix Rigidity
several researchers, Valiant’s question still remains unsolved. The best
known lower bound is RF (r) = Ω􏰃 n2 log n 􏰄 proved for various matrices Arr
by Friedman [31] and Shokhrollahi et al. [93]; note that this bound reduces to a trivial Ω(n) when r = εn. We present below the proof from [93]. This proof yields the best known lower bounds for both finite and infinite fields. We will use ideas from the proof in [31], which seem to work only for finite fields, in Section 2.6 in the context of Paturi–Pudla ́k dimensions.
We will use the following lemma from extremal graph theory about the Zarankiewicz problem [16, Section IV.2, Theorem 10].
Lemma 2.3. Let G be an n × n bipartite graph. If G does not con- tain Ks,s (complete bipartite graph with s vertices on either side) as a subgraph, then the number of edges in G is at most
(s−1)1/s(n−s+1)n1−1/s +(s−1)n.
Lemma 2.4. Let r ≥ log2 n. If fewer than n(n − r) log n
2(r + 1) r
changes are made to an n×n matrix A, then some (r+1)×(r+1)
submatrix of A remains untouched.
Proof. Think of A as an n × n bipartite graph with unchanged entries
as edges. Then this bipartite graph has at least n2 − n(n−r)logn
2(r + 1) r
edges. It is easy to check that, for log2 n ≤ r, this quantity is more than the bound from Lemma 2.3 with s = r + 1. Thus, the graph must have Kr+1,r+1 as a subgraph. This means the corresponding (r + 1) × (r + 1) submatrix remains untouched.

2.2 Explicit Lower Bounds 17
This lemma can be applied to prove rigidity lower bounds for matrices in which al l submatrices of sufficiently large size are nonsingular. We prove some of these bounds below. For simplicity in these bounds, we assume r ≤ n/2 and replace n(n − r) in Lemma 2.4 by n2/2.
Theorem 2.5. Let F be a field containing at least 2n elements and let
x1,…,xn, and y1,…,yn be 2n distinct elements such that (xi + yj) ̸= 0
for any i,j. Then, the Cauchy matrix C, given by C := 􏰃1/x + y 􏰄n , i j i,j=1
has the rigidity lower bound
RC (r) ≥ 4(r + 1) log r
for log2n≤r≤n/2.
Proof. It is well-known and easy to prove that the determinant of the
n2 n
Cauchy matrix C is given by
det C =
􏰹 (xi −xj)􏰹 (yi −yj) i0.
Proof. Let G be a superconcentrator with cn edges, input nodes x1,…,xn and output nodes y1,…,yn. Let the vertices of G be v1,…,vm in topological order. The fan-in (in-degree) of each vi is 2. We will define the matrix A by defining a linear circuit computing the linear trans- formation x 􏰶→ Ax from the graph G. We will identify each vertex with the linear form in x1,…,xn computed at that vertex. It is helpful to think of the circuit as given by a (2n + m) × n matrix with the columns indexed by the xj, the first n rows labeled by the xi, the last n rows by the yi, and the remaining m rows by vi. Each row gives a vector in Cn associated to the linear form in the xi computed at the corresponding node. Thus, in the first n rows, we have the identity matrix whereas the last n rows give us the desired matrix A.
We define the rows of this matrix inductively. Suppose u1 and u2 are already defined and are inputs to a node v. We define the coefficients αandβatthenodevsothegatecomputesv=αu1 +βu2.Chooseα and β as follows: for all r-tuples of xj and all linear forms w1,…,wr−1 already constructed, whenever {u1,w1,…,wr−1} and {u2,w1,…,wr−1} are sets of linearly independent forms when restricted to xj1,…,xjr, {αu1 + βu2,w1,…,wr−1} is also a linearly independent set of linear forms when restricted to xj1,…,xjr.
Let det(vi1,…,vir;xj1,…,xjr) denote the determinant of the sub- matrix indexed by the rows vi1 ,…,vir and columns xj1 ,…,xjr . Then by linearity,
det(v,w1,…,wr−1;xj1,…,xjr)
= αdet(u1,w1,…,wr−1;xj1,…,xjr)
+βdet(u2,w1,…,wr−1;xj1,…,xjr).

2.3 Sparse Factorizations and Constant Depth Circuits 23
If the two determinants on the right-hand side are nonzero, there is a unique value for the ratio α : β that is forbidden by the requirement that the left-hand side determinant is nonzero. It follows that for every choice of w1,…,wr−1 and xj1 ,…,xjr , at most one choice of the ratio α : β is forbidden in defining the coefficients α and β in the gate v = αu1 + βu2. Since there are only finitely many choices for the wi’s and the xj’s, we conclude we can always find integer values for α and β so the condition above is satisfied at v.
By the superconcentrator property of G, from any set of r inputs xj1 ,…,xjr to any set of r outputs yi1 ,…,yir , there are r vertex-joint paths. By considering sets of r parallel nodes at successive levels, we can see by induction that the sub-determinant given by the corresponding rows and the xj ’s is nonsingular. We conclude that all submatrices of all orders are nonsingular in the matrix A. Since the graph of the circuit is of linear size, we can apply the graph–theoretic argument in the proof of Theorem 2.1 to express A as A=B+C such that for any ε>0, rank(B)≤εn and |C|≤n·2log1−δn for some δ=δ(ε)>0. It follows that RA(εn) = n1+o(1) for any ε > 0.
2.3 Sparse Factorizations and Constant Depth Circuits
Since strong lower bounds on rigidity seem to be difficult to obtain, a natural approach is to study restricted variants of the problem. We discuss one such combinatorial restriction in this section and some others later.
For a matrix A, define
w2(A) := min{|B| + |C| : A = BC},
where the minimum is taken over all matrix pairs B and C such that A = BC. Recall that |B| + |C| is the total number of nonzero entries in B and C.
Note that w2(A) captures the minimum size of a depth-2 linear circuit computing the linear transformation is given by A. We can naturally generalize this definition to wk and use it to prove lower bounds on depth-k linear circuits. In fact, we will do so in Section 3 (cf. Lemma 3.14 and Theorem 3.15) for the restricted model of bounded

24 Matrix Rigidity
coefficient linear circuits. However, explicit lower bounds on wk have not yet been used successfully to prove strong superlinear, i.e., n1+δ, lower bounds on constant depth linear circuits in general.
As usual, it is easy to see that for almost n × n matrices A, w2(A) = Ω(n2/logn), and the challenge is to find explicit A for which w2(A) ≥ n1+δ can be proved. The best known lower bound is Ω(n log2 n/ log log n) and follows from the tight lower bound on depth-2 superconcentrators [82]. This lower bound holds for any matrix in which every square submatrix is nonsingular: examples include the Cauchy matrix A = (1/(xi + yj )) where xi and yj are all distinct and Fourier transform matrices of prime order (cf. Theorems 2.5 and 2.8). Pudla ́k [74] shows how to derive a lower bound on w2(A) from a very strong lower bound on RA(r) for r in a suitable range.
We note that in the proof of the lower bound on w2(A) using the lower bound on depth-2 superconcentrators, only connectivity proper- ties of the factorization A = BC are used; associate a depth-2 graph with A = BC with inputs being column indices of A (= column indices of C), outputs being row indices of A (= row indices of B), and the middle being the column indices of B (= row indices of C) and note that nonsingularity of a square submatrix AST of A implies that |S| vertex-disjoint paths must exist between S and T (Menger’s theorem). Since the lower bound of [82] is tight for depth-2 superconcentrators, this approach cannot yield better lower bounds on w2(A). This is sim- ilar to the limitation that exists even for the original rigidity problem as discussed in Section 2.2.1.
2.4 Rigidity Lower Bounds Using Algebraic Dimensions
We observed in the last two subsections that some commonly used approaches to the rigidity problem face certain fundamental limi- tations: we call this the barrier of purely combinatorial approaches since the proof techniques essentially use only the connectivity (superconcentrator-like) properties and combinatorial structure of sub- matrices of the candidate matrices appealing very little to their alge- braic structure. We believe that any stronger lower bound proof on

2.4 Rigidity Lower Bounds Using Algebraic Dimensions 25
rigidity will have to penetrate this barrier by delving deeper into the algebraic structure of the candidate matrices.
In this subsection, we look at such an attempt. We can prove much stronger, even quadratic, lower bounds on the rigidity of some com- plex matrices by exploiting algebraic independence (in a limited sense) among their entries. In particular, we will prove a lower bound of RV (r) = Ω(n2) for r ≤ ε√n, on the rigidity of Vandermonde matrices
V = (xj−1)1≤i,j≤n with algebraically independent xi [56]. We will also i2√
prove [59] that RA(εn) = Ω(n ) for two matrix families: (i) A = ( pjk) and (ii) A = (e2πi/pjk ), where pjk are the first n2 primes. These bounds hold over C. These bounds break through the combinatorial barriers mentioned above. The result for Vandermondes provides a natural n- dimensional manifold in the space of all n × n matrices with Ω(n2) rigidity for nonconstant r. The results for (√pjk) and (e2πi/pjk ) are the first quadratic lower bounds known on the rigidity of a non-generic matrix over any field for rank bound Ω(n).
The proof for (√pjk) uses an algebraic dimension concept intro-
duced by Shoup and Smolensky [91] (SS-dimension, for short). To prove
the rigidity lower bound on (e2πi/pjk ), we use a higher degree generaliza-
tion of the SS-dimension. The SS-dimension was used in [91] to derive
superlinear lower bounds on linear circuits of depths up to poly(log n)
for linear transformations defined by a Vandermonde matrix and its
inverse. In their Vandermonde matrix V = (xj−1), the xi are either i
algebraically independent transcendentals or superincreasing integers (xi = 2ni ). We note that Shoup and Smolensky prove these superlin- ear lower bounds directly, without appealing to or proving any rigidity bounds on their matrices.
More generally, algebraic dimension arguments involving square roots of primes and roots of unity of prime orders have been used in the literature to replace generic or random elements to fool “low-degree” computations. They have been used to construct specific polynomials that are hard to compute [7, 20, 52]. Square roots of primes are also used to define hard instances (Swinnerton–Dyer polynomials) for cer- tain polynomial factorization algorithms [101, Section 15.3]. Rational approximations of square roots of primes are used in [25], to reduce ran- domness in polynomial identity testing based on the Schwartz–Zippel

26 Matrix Rigidity
Lemma. This approach is extended in [23] to prove that the square roots of any set of rationals independent over squares (i.e., linearly independent in the F2-space Q∗/Q∗2) can be used in place of square roots of primes. The approach in [25] is generalized in [51] using square roots of irreducible polynomials to be applicable over arbitrary fields — in particular over finite fields.
2.4.1 The Shoup–Smolensky Dimensions
In this subsection, we define the various SS-dimensions we use and prove some preliminary lemmas relating them to matrix rank. Shoup and Smolensky originally used Definition 2.3 in [91].
Let p be a nonnegative integer, P = (a1,…,ap) a sequence of elements (ai ∈ C), and t ≥ 0.
Definition 2.3. The restricted SS-dimension of degree t of P over Q, denoted by Dt(P), is the rank over Q of the set of all the 􏰃p􏰄 products
􏰹t t j=1aij, where 1≤i1 0 is an absolute constant.
In particular, there exists an ε > 0 such that for every r ≤ ε n,
RV (r) ≥ n2/4.
Proof. Let C be a matrix with the smallest number of nonzero entries such that V − C has rank r. Thus, RV (r) = |C|, number of nonzero entries of C.
Denote by si the number of changes in the ith row of V . Let s denote the average of the si, s:=(s1 +···+sn)/n. Thus, |C|=n·s.
There must be at least n/2 rows of V with no more than 2s changes in each of those rows. Fix n/2 such rows and call them “good” rows. (The factor of 2 here is not optimal; see Remark 2.3.)
We claim that
Dn/2(V − C) ≥ (n − 2s)n/2. (2.12)
To prove this claim, consider the products formed by taking uncha-
nged entries of good rows, one entry per row. They are of the form
xj1 xj2 ···xjt , where i1,…,it are good rows and each jk has at least i1i2 it
(n − 2s) possibilities. By the algebraic independence of the xi, these products are linearly independent over Q. Hence, the claim follows.
Since V − C has rank r, Corollary 2.15 gives an upper bound on DV−C:

􏰈nr + t􏰉2 Dt(V −C)≤ t
Combining (2.12) and (2.13), we get,
.
(2.13)
􏰈nr + n/2􏰉2 (n − 2s)n/2 ≤ n/2 .

30 Matrix Rigidity
􏰃 􏰄 􏰈(nr + n/2)e􏰉2·n/2
Since nk ≤(ne/k)k, (n−2s)n/2 ≤ n/2 . Hence, (n − 2s) ≤ ((1 + 2r)e)2
≤ c · r2, for some constant c>0. This gives us, s ≥ (n−c·r2).
2
Since RV (r) = |C| = n · s, we have proved the theorem.
Remark 2.3. The bound of Theorem 2.17 can be improved by a con- stant factor with a more careful choice of the parameters. Indeed, as pointed out by Tom Hayes (private communication), by selecting t ≈ (√2ern)2/3 in the proof above, it can be shown that RV (r) ≥ n2 − O(n5/3r2/3).
Theorem2.18. Let A be an n×n matrix over C and 0≤r≤n. Suppose, Dnr(A) = 􏰃n2􏰄, i. e., all products of nr distinct entries of A
nr
are linearly independent over Q. Then,
RA(r) ≥ n(n − 16r). (2.14)
Proof. Let C be a matrix such that RA(r) = wt(C) and rank (A − C) ≤ r. By Corollary 2.15, we have
􏰈nr + t􏰉2
Dt(A − C) ≤ Dt∗(A − C) ≤ t . (2.15)
In order to obtain a lower bound on Dt(A − C), let us consider all t-wise products of elements of A not affected by C. By assumption, these (as well as the products of all other t-tuples from A) are linearly independent over Q; therefore,
􏰈n2 − wt(C)􏰉
Dt(A − C) ≥ t . (2.16)

2.4 Rigidity Lower Bounds Using Algebraic Dimensions 31 Set t = nr and combine inequalities (2.15) and (2.16). We use the
inequality 􏰃n􏰄 ≥ (n/k)k. k
􏰈n2 − wt(C)􏰉 􏰈2nr􏰉2 nr ≤nr
􏰈n2 − wt(C)􏰉nr 􏰃 2nr􏰄2 nr nr ≤2 =16
wt(C) ≥ n2 − 16nr. We conclude that RA(r) ≥ n(n − 16r).
Corollary 2.19. Let P = 􏰃√p 􏰄, where p are distinct primes for ij ij
1 ≤ i,j ≤ n. Then, RP (r) ≥ n(n − 16r). In particular, RP (n/17) ≥ n2/17.
To obtain Corollary 2.19, we combine Theorem 2.18 with the fol- lowing result cf. ([12, 4, Ex. 2.1.41]). An integer is square-free if it is not divisible by the square of any prime number.
Theorem 2.20. The square roots of all positive square-free integers are linearly independent over Q. In particular, for distinct primes p1,…,pm, [Q(√p1,…,√pm) : Q] = 2m.
The next rigidity lower bound uses the Generalized SS-dimension. Theorem 2.21. Let
RZ (r) ≥ n(n − 9r),
assuming n is sufficiently large.
In particular, RZ(n/10) ≥ n2/10.
Z := 􏰃e2πi/pjk 􏰄
,
where pjk are the first n2 distinct primes. Then, for 0 ≤ r ≤ n, we have
1≤j,k≤n

32 Matrix Rigidity
Proof. Let C be a matrix such that wt(C) = RZ (r) and rank(Z − C) ≤ r.
Let m := n2 − wt(C) be the number of entries of Z “untouched” by C and p1,…,pm be the corresponding primes. Let
P := (e2πi/p1,…,e2πi/pm) and t := (p1 − 1,…,pm − 1).
We will consider Dt(P). To get a lower bound, we use the following facts.
Lemma 2.22. [Q(e2πi/n) : Q] = φ(n).
For a proof of this lemma, see, e.g., [50, Ch VI, Theorem 3.1].
Lemma 2.23. Q(e2πi/a1,…,e2πi/am) = Q(e2πi/lcm(a1,…,am)). This lemma is easy to prove.
It follows that Dt(P)=[Q(e
2πi/p1
,…,e
2πi/pm
):Q]=φ(p1 ·····pm)=
􏰖m
(pi −1).
i=1
(2.17)
On the other hand, the elements of P are entries of the matrix Z − C of rank at most r. Thus, by Lemma 2.16, we have the upper bound
􏰈nr + σ(t)􏰉2
Dt(P) ≤ nr . (2.18)
Since 􏰹m (pi −1)≥m! and σ(t)=􏰂m (pi −1)=O(mn2logn), i=1 􏰃a􏰄 b i=1
and using the inequality b ≤ a , we obtain from (2.17) and (2.18), m! ≤ Dt(P) ≤ (nr + O(mn2 logn))2nr. (2.19)
Since nr,m ≤ n2, after taking logarithms of the above inequality, we obtain mlogm ≤ c2nrlogn for some constant c2 > 0. Since RA(r) ≤ (n−r)2 ingeneral,wenotethatm=n2 −RZ(r)≥2nr−r2 ≥2n−1 for 1≤r≤n. Using this in the last inequality, we have m≤9nr assuming n is sufficiently large.

2.5 Quadratic Lower Bounds on Linear Circuits 33 2.5 Quadratic Lower Bounds on Linear Circuits
Combining Corollary 2.19 or Theorem 2.21 with Theorem 2.1, we get the following circuit lower bound.
Theorem 2.24. Let A = 􏰃√p 􏰄 or A = 􏰃e2πi/pjk 􏰄, where p are the jk jk
first n2 primes for 1 ≤ j, k ≤ n. Then, any linear circuit of logarithmic depth computing x 􏰶→ Ax must have size Ω(n log log n).
We will now see that using the SS-dimension, we can obtain quadratic lower bounds on linear circuits significantly improving those in Theorem 2.24 via Valiant’s criterion. Indeed, such lower bounds were already proved by Lickteig [52]. Nevertheless, we present a proof of the circuit lower bound using the generalized SS-dimension since we feel it is simple, intuitive, and fits well within the framework of rigidity.
The following lemma generalizes an inequality from [91] on SS- dimension.
Lemma 2.25. Let L be a linear circuit over C computing the linear
transformation x 􏰶→ Ax. Let s denote the size and d denote the depth
ofL.Defines ̄:=s/d.ForasetT ⊆Nn2,defineσ(T):=max t∈T
􏰂n2 t i=1 i
and let DT (A) be as in Definition 2.5. Then, 􏰈 s ̄ + σ ( T ) 􏰉 d
DT (A) ≤ s ̄ .
(2.20)
the longest path to g from an input has k edges; the input nodes are at level 0. For 1≤k≤d, let Lk denote the set of labels on the edges that feed into gates at level k. Our first observation is that an entry a := aij of the matrix A is equal to the sum of labels of all the paths from input j to output i, where the label of a path is defined to be
Proof. Let (a1,…,am) be the sequence of entries of the matrix A in some order, where m := n2. By abuse of notation, we use A to also denote this sequence. We want to estimate the Q-linear dimension spanned by monomials of the form ae1 ···aem, where e ∈ T.
1m
Arrange the circuit L into d levels, where a gate g is at level k if

34 Matrix Rigidity
the product of all the labels on its edges. Here and below, we suppress many subscripts for brevity. The intended summation ranges should be clear from the context. Thus, we have
􏰕
p
where the sum ranges over all paths p from input j to output i and λk ∈ Lk ∪ {1} are “labels” on edges of the path p. We include 1 here, since an edge may not go between two consecutive levels and hence 1 may be used as λk for the “skipped” k. Hence, a monomial in the a’s is given by
􏰖m 􏰖m􏰕 aei =
a =
λ1 ···λd,
(λi1 ···λid)ei
􏰕e1 em e1 em
i
i=1
=
Note that each monomial λe1 ···λem has labels from Lk ∪ {1} and 1k mk
may have repetitions. Since e ∈ T , we may thus view it as a monomial
i=1 pi
(λ11 ···λm1)···(λ1d ···λmd).
of total degree at most σ(T) on |Lk| variables. Let sk =|Lk|. There
are at most 􏰃sk+σ(T)􏰄 such monomials. Hence, each monomial 􏰹m aei sk i=1 i
in our space is an integer linear combination of products of d such monomials from Lk for 1 ≤ k ≤ d. It follows that
DT(A)≤􏰖d 􏰈sk+σ(T)􏰉 k=1 sk
sk +σ(T) d
 1􏰂d ≤d k=1
,
where the last inequality is a consequence of the log-concavity of f (x) =
􏰃x+c􏰄. Since s = size(L) = 􏰂d sk, we have proved the claim. x k=1
1􏰂d sk d k=1
Corollary 2.26. Let Z := 􏰃e2πi/pjk 􏰄nj,k=1 where pjk are the first n2 primes. Then, any arithmetic circuit computing the linear transforma- tion x 􏰶→ Zx must have size Ω(n2).

2.6 Paturi–Pudl ́ak Dimensions 35
Proof. Let m := n2. We will apply the special case Dt(Z) of DT (Z) with t = (p1 − 1,…,pm − 1). From the proof of Theorem 2.21, we know that DT (Z) ≥ m!. On the other hand, σ(T) = σ(t) = 􏰂mi=1(pi − 1) = Θ(m2 log m). Since s ≤ n2 always, we have s ̄ ≪ σ(T ) and the easy esti- mate 􏰃s ̄+σ(t)􏰄 ≤ (2σ(T ))s ̄. Using these in (2.20), we get
s ̄
m! ≤ (2σ(T))s ̄d ≤ (c · m2 logm)s.
Taking logarithms on both sides of this inequality, we obtain s =
Ω(m) = Ω(n2).
Corollary 2.27. Let P = 􏰃√p 􏰄, where p are distinct prime inte-
ij ij
gers. Then, any arithmetic circuit computing x 􏰶→ P x must have size Ω(n2 / log n).
Proof. Let m := n2. Let T := {(e1,…,em) : 0 ≤ ei ≤ 1}. From Theo- rem 2.20, DT (P) = 2m. Since σ(T) = m ≥ s, we obtain from (2.20),
2m ≤ (2m)s ̄d = (2m)s. Taking logarithms proves the claim.
The lower bounds in the corollaries above do not depend on the depth of the circuits. However, Shoup and Smolensky [91] exploit the dependence of (their special case of) inequality (2.20) on depth to derive lower bounds of Ω(dn1+1/d) for d≤logn, and Ω(nlogn/ (logd−loglogn)) for larger d, for Vandermonde matrices with alge- braically independent generators.
2.6 Paturi–Pudl ́ak Dimensions
Paturi and Pudla ́k [70] introduced certain dimensions of a subspace (which we call here inner and outer dimensions) that refine the notion of rigidity. They may give rise to new techniques to prove lower bounds on linear circuits.
A vector v ∈ Fn is said to be s-sparse if the number of nonzero coordinates in v (often denoted wt(v) or |v|) is at most s.
Let V ⊆Fn be a subspace, where dimV =m.

36 Matrix Rigidity
Definition 2.6. The inner dimension dV (s) for sparsity parameter s of V is the dimension of the largest subspace of V that can be generated by s-sparse vectors. In other words,
dV (s) := max{dim(U ∩ V ) :
U ⊆ Fn has an s-sparse basis anddimU ≤ dimV }.
Definition 2.7. The outer dimension DV (s) for sparsity parameter s of V is the dimension of the smallest subspace W that contains V and that can be generated by s-sparse vectors. In other words,
DV (s) := min{dimW : W has an s-sparse basis and V ⊆ W}.
Given a matrix A ∈ Fm×n, we let ⟨A⟩ to denote the row-space of A. Note that ⟨A⟩ is a subspace of Fn of dimension at most m. By abuse of notation, we denote d⟨A⟩(s) and D⟨A⟩(s) simply by dA(s) and DA(s).
To relate inner dimension to rigidity, we define the following version of rigidity of a matrix when the changes are counted on a per row basis.
ρA(s) := min{rank(A − C) : Each row of C is s-sparse}.
Clearly, if ρA(s) ≤ r, then RA(r) ≤ ns. But lower bounds on ρA(s) are sufficient to prove lower bounds on linear circuits via Valiant’s criterion.
Lemma 2.28. For A ∈ Fm×n,
ρA(s) ≥ rankA − dA(s).
Proof. Let C be a matrix achieving ρA(s), so rank(A − C) = ρA(s) and each row of C has at most s nonzero entries.
Trivially, ⟨C⟩ + ⟨A⟩ = ⟨C⟩ + ⟨A − C⟩. Considering the left-hand side,
dim(⟨C⟩ + ⟨A⟩) = dim⟨A⟩ + dim⟨C⟩ − dim(⟨A⟩ ∩ ⟨C⟩).

2.6 Paturi–Pudl ́ak Dimensions 37
Considering the right-hand side,
dim(⟨C⟩ + ⟨A − C⟩) ≤ dim⟨C⟩ + dim⟨A − C⟩ = dim⟨C⟩ + ρA(s).
Thus, we have dim⟨A⟩ − dim(⟨A⟩ ∩ ⟨C⟩) ≤ ρA(s). Since dim⟨A⟩ = rank(A) and dA(s) ≥ dim(⟨A⟩ ∩ ⟨C⟩), we are done.
Lemma 2.29. For any finite-dimensional subspace V , dV (s) + DV (s) ≥ 2dimV.
Proof. Let D:=DV(s) and d:=dV(s). Let W⊇V be an optimal subspace realizing DV (s) with a sparse basis {w1,…,wD}. Consider the subspace U ⊆ W with the basis {w1,…,wm}, where m = dimV . Clearly, U +V ⊆W and dim(U ∩V)≤d. Hence,
D≥dim(U +V)=dimU +dimV −dim(U ∩V)≥2m−d.
While there seems to be no obvious relation between DA(s) and rigidity, we can use DA(s) directly to prove lower bounds on log-depth linear circuits using a proof technique similar to that of Theorem 2.1.
Theorem 2.30. Suppose, the linear transformation x 􏰶→ Ax for A ∈ Fn×n is computed by a linear circuit of size L and depth k. Then, there exists a constant c > 1 such that for any integer t > 0,
DA(ck/t) ≤ n + Llogt. log k
In particular, if, for some constant ε > 0, DA(nε) ≥ n + ω(n/loglogn), then x􏰶→Ax cannot be computed by log-depth linear circuits of size O(n).
Paturi and Pudla ́k use a counting argument to show that for a random m-dimensional subspace V of Fn, where F is finite, |F| = q,
with high probability.
􏰈 slogq n􏰉 DV(s)≥n1−m ,

38 Matrix Rigidity
The proof of Theorem 2.17 actually yields a lower bound of ρV (s) = Ω(√n − s) for a generic Vandermonde matrix. However, no nontrivial upper (lower) bounds on d (D) of a generic Vandermonde matrix are currently known.
In the following theorem, we use codes over F2 to prove nontrivial lower (upper) bounds on DV (s) (dV (s)). It can be easily generalized to codes over an arbitrary finite field Fq in an obvious way (the constants depend on q).
Theorem 2.31. Let C be an [n,k,d] linear code over F2. Then, for s ≤ d/2,
d 􏰈2sk􏰉
DC (s) ≥ k + 2s log d , and
d 􏰈2sk􏰉 dC (s) ≤ k − 2s log d .
(2.21) (2.22)
Proof. The proof is based on the following main claim: there exists a [DC(s),k,d/s] code B. Consider a space W of dimension D := DC(s) containing C and having an s-sparse basis {w1,…,wD}. Every vector x ∈ C is a linear combination of vectors from W , i.e., there is a y ∈ FD (unique since wi are a basis) such that x = 􏰂Di=1 yiwi. Let B be the set of all such y for all x ∈ C. Clearly, B is a subspace of dimension k. We note that the minimum weight of a nonzero vector in B is at least d/s and this will prove the claim. Indeed, let y∗ be a minimum weight nonzero vector in B and let x∗ = 􏰂Di=1 yi∗wi. Since wi are linearly inde- pendent, x∗ ̸= 0 and by the property of C, x∗ is of weight at least d. Since each wi has weight at most s, at least d/s of the coordinates yi∗ must be nonzero.
We apply the sphere packing bound for the code B. The Hamming balls of radius d/2s around vectors in B must be nonintersecting and hence their union must contain at most 2D vectors:
d/2s􏰈 􏰉
2 k 􏰕 Dj ≤ 2 D . j=0

2.6 Paturi–Pudl ́ak Dimensions 39 Let Λ(D,d/2s):=􏰂d/2s􏰃D􏰄 denote the volume of a Hamming ball
k + log2 Λ(D, d/2s). (2.21) follows.
The bound on dC(s) in (2.22) is proved using similar argument. Let U be a k-dimensional subspace realizing dC (s). Thus, dim(U ∩ C) = dC(s). Similar to the claim above, we can show that C contains a [k,dC(s),d/s] code B′. Applying the sphere packing bound to B′, we have
2dC(s)Λ(k,d/2s) ≤ 2k. Simplifying as before, we obtain (2.22).
Friedman uses (2.22) to prove the first nontrivial lower bound on the rigidity of a matrix over a finite field. Note that we proved essentially the same lower bound in Theorem 2.7.
Corollary 2.32. Let A ∈ Fk×n be the generator matrix of an asymp- totically good error correcting code C. Then, for 0 < r < k/2, 􏰈n2 n􏰉 RA(r)=Ω rlogr . j=0 j of radius d/2s in FD. Then, we conclude from the above that D ≥ 2 􏰃D􏰄 􏰃k􏰄 d/2s Since Λ(D, d/2s) ≥ d/2s ≥ d/2s ≥ (2sk/d) , Proof. Since C is asymptotically good, we know that k = Ω(n) and the minimum distance d of C is also Ω(n). Thus, it follows from (2.22) that dC(s)≤k−(d/2s)log(2ks/d). From Lemma 2.28, ρA(s) ≥ k − dC(s) ≥ (d/2s)log(2ks/d) = Ω(nlogs/s). In other words, s = Ω􏰃n log n􏰄 changes must be made in each row to reduce rank of A r r 􏰃n2 n􏰄 tobeatmostr.Thus,RA(r)≥ks=Ω r logr . 3 Spectral Methods to Study Rank Robustness As mentioned before, many functions similar to rigidity have several applications in complexity theory. To encompass this broad range of applications, it is helpful to think of rank robustness in general. By changing the nature of changes or by changing the definition of distance between a given matrix and the set of low-rank matrices, we get various notions of rank robustness. For example, when the matrix is over R or C, we may consider l2-distance (instead of Hamming distance as in the original definition of rigidity) or we may consider only sign- preserving changes. Many results [21, 28, 29, 42, 55, 57, 58, 75, 83] successfully use such notions of rank robustness to derive lower bound results in various models. We discuss some of them in this section and the next. The “nonsmoothness” of Hamming distance seems to make it difficult to attack the general rigidity problem using well-studied math- ematical properties of matrices. On the other hand, several smoother variants of rigidity, e.g., l2-Rig,Rig,msv,Vol, studied in this section, seem to be easier to get a handle on using classical techniques from matrix analysis [13, 33, 95]. In particular, these robustness functions of matrix rank are easily bounded, or even characterized, in terms of 40 3.1 Singular Value Decomposition 41 the singular values of a matrix. More importantly, all these functions are very useful in deriving interesting lower bounds on computational complexity. Significant applications of this approach are lower bounds on the bounded coefficient complexity of linear and bilinear transfor- mations, e.g., Fourier transform and matrix multiplication. 3.1 Singular Value Decomposition We begin by recalling some basic facts about singular values of a com- plex matrix and state an important variational inequality about them. Definition 3.1. Let A ∈ Cm×n. Then, • The Frobenius norm of A is  1/2 ∥A∥F :=􏰕|aij|2 . i,j • The Spectral norm of A, ∥A∥2, usually denoted ∥A∥, is defined by ∥A∥ := max ∥Ax∥ , x̸=0 ∥x∥ where ∥v∥ = v∗v for v ∈ Cn. • The ith Singular value, σi(A), is defined by 􏰛 σi(A) = where λi(AA∗) denotes the ith largest eigenvalue of AA∗. Fact 3.1. Let A ∈ Cm×n. Then, there exist unitary matrices U ∈ Cm×m and V ∈ Cn×n such that U∗AV = diag(σ1,...,σp), where p = min{m,n}. λi(AA∗), 1 ≤ i ≤ min{m,n}, 42 Spectral Methods to Study Rank Robustness A proof of this fact can be found in [33, Section 2.5]. The Courant–Fischer minimax theorem gives a characterization of singular values. Fact 3.2. For i = 1,...,min{m,n}, σi(A)= max min ∥Ax∥ = min max ∥Ax∥, dim(S)=i 0̸=x∈S ∥x∥ dim(T )=n−i+1 0̸=x∈T ∥x∥ where S ranges over all i-dimensional subspaces and T ranges over all (n − i + 1)-dimensional subspaces of Cn. Clearly, rank, Frobenius norm, and spectral norm of A are invariant under unitary transformations. Thus, Facts 3.1 and 3.2 imply the fol- lowing. Fact 3.3. Let A ∈ Cn×n. Then, (1) rank(A) = r if and only if σ1(A) ≥ ··· ≥ σr(A) > σr+1(A) = ··· = σn(A) = 0. (2) ∥A∥2F = σ12(A) + ··· + σn2(A).
(3) ∥A∥ = σ1(A).
The following important inequality [38] is often useful.
Theorem 3.4 (Hoffman-Wielandt Inequality). Let A and B be matrices in Cn×n. Then,
􏰕n
[σi(A) − σi(B)]2 ≤ ∥A − B∥2F .
i=1
Hoffman and Wielandt [38] prove their result for eigenvalues of normal matrices using the Birkhoff–von Neumann characterization of doubly stochastic matrices. The theorem for singular values as stated here can be found in [33, Section 8.3].

3.2 Variants of Rigidity
We will first consider two variants of the rigidity function based on the l2-distance to low-rank matrices.
Definition 3.2. Let A ∈ Cn×n and θ ≥ 0.
• l2 variant of rigidity: This is a variant of rigidity that
measures the l2-norm of changes. 
􏰕 
l2-Rig2A(r) := min |aij − bij|2 : rank(B) ≤ r . B i,j 
• Bounded changes variant of rigidity: This is a restricted vari- ant of rigidity where the changes are bounded in absolute value by θ.
RA(r,θ) := min{|A − B| : rank(B) ≤ r, ∀i,j |ai,j − bij| ≤ θ}. B
Recall that |C| denotes the number of nonzero entries of the matrix C.
We can immediately characterize l2-Rig in terms of singular values. Lemma 3.5. For any matrix A ∈ Cn×n,
􏰕n i=r+1
Proof. The lower bound is immediate from the Hoffman-Wielandt inequality, but it can also be proved directly.
Let B be a matrix achieving the minimum in the definition of
l2-Rig2A(r). Let N0 denote the null space of B. Choose a unit vector ν0 ∈
N0 that achieves max0̸=x∈N ∥Ax∥ . By Fact 3.2, since dim(N0) = n − r, 0 ∥x∥
σr2+1(A) ≤ ∥Aν0∥2. Continuing for i = 1,…,(n − r − 1), let Ni := {x ∈ Ni−1 : x ⊥ ν0,…,x ⊥ νi−1}. Choose νi to be a unit vector in Ni
3.2 Variants of Rigidity 43
l2-Rig2A(r) =
σi(A)2.

44 Spectral Methods to Study Rank Robustness
that achieves max0̸=x∈N ∥Ax∥ . Since dim(Ni) = n − r − i, by Fact 3.2,
i ∥x∥
σr2+i+1(A) ≤ ∥Aνi∥2. By orthonormality of ν0,…,νn−r−1 and since
Bνi =0 for 0≤i≤(n−r−1), it follows that
n−r−1 n−r−1 n−r−1
2􏰕2􏰕2􏰕2
∥A − B∥F ≥ ∥(A − B)νi∥ = ∥Aνi∥ ≥ σr+1+i(A).
i=0 i=0 i=0
􏰂 Equality follows by considering the rank r matrix B := ri=1 σi(A)ui · vi∗, where ui and vi∗ are the i-th column and the i-th
row of U ∗ and V , respectively, in Fact 3.1.
We will prove a lower bound on RA(r,θ) for a specific matrix in the next subsection.
The following lemma gives a useful lower bound on the rank of a submatrix in terms of the spectral norm.
Lemma 3.6. For any submatrix B of a matrix A, rank(B) ≥ ∥B∥2F /∥A∥2.
Proof. From Fact 3.3, rank(B) ≥ ∥B∥2F /∥B∥2. Since B is a submatrix of A, ∥B∥ ≤ ∥A∥.
Using Hoffman–Wielandt inequality, we can prove a generalization of the above lemma.
Lemma 3.7. Let A,B ∈ Cn×n. Then, rank(B)≥ R⟨A,B⟩,
∥A∥∥B∥
where ⟨A,B⟩ := Tr(AB∗) and Rx denotes the real part of a complex
number x.
Proof. Using Theorem 3.4,
􏰕n i=1
(σi(A) − σi(B))2
∥A − B∥2F ≥

2 2 􏰕n = ∥A∥F + ∥B∥F − 2
σi(A)σi(B),
i=1
using Fact 3.3(ii)
≥ ∥A∥2F + ∥B∥2F − 2 rank(B)∥A∥∥B∥, using Fact 3.3(i) and (iii).
Observe that for any matrix M, ∥M∥2F = Tr(MM∗). Using this in the last inequality above, we get
2 rank(B)∥A∥∥B∥ ≥ ∥A∥2F + ∥B∥2F − ∥A − B∥2F = Tr(AB∗) + Tr(BA∗)
= 2 RTr(AB∗),
3.2.1 Lower Bounds for a Generalized Hadamard Matrix
We now prove lower bounds on the three variants of rigidity for a “Generalized Hadamard Matrix.”
Definition 3.3. An n × n complex matrix H is called a Generalized Hadamard matrix if (i) |hij| = 1 for all 1 ≤ i,j ≤ n, and (ii) HH∗ = nIn, where H∗ is the conjugate transpose of H and In is the n × n identity matrix.
Two important example of a generalized Hadamard matrix are (i) the Sylvester type Hadamard matrix (defined in (2.3)), and (ii) the
and the lemma is proved.
3.2 Variants of Rigidity 45
Fourier transform matrix (see Theorem 2.8). √ We note that for a generalized Hadamard matrix H, σi(H) =
n
for all i,1≤i≤n.
We first prove a lower bound on the rigidity of a generalized
Hadamard matrix. This proof is from [26].
Theorem 3.8. For any generalized n × n Hadamard matrix H,
n2 RH(r)≥ 4r.

46 Spectral Methods to Study Rank Robustness
Proof. Let R be the minimum number of changes that brought the rank of H down to r. By a simple averaging argument, we can find 2r rows of H that contain a total of at most 2rR/n changes. If n ≤ 2rR/n, then R ≥ n2/2r and we are done. Hence, we can assume that n − 2rR/n > 0. Consider the (n − 2rR/n) columns that contain no changes in the above set of rows. We thus get a 2r × (n − 2rR/n) submatrix B that contains no changes and hence is a submatrix of H. By definition of R, this submatrix must have rank at most r. Applying Lemma 3.6, we get r ≥ rank(B) ≥ 2r(n − 2rR/n)/n, since ∥B∥2F is exactly the num- ber entries in B. Rearranging this inequality, we get R ≥ n2/4r.
Remark 3.1. Kashin and Razborov [42] also use spectral methods to prove an Ω(n2/r) lower bound on the rigidity of a generalized Hadamard matrix. The essential claim in their paper is that a random k × k submatrix of a generalized Hadamard matrix has rank Ω(k).
Corollary3.9. (to Lemma 3.5) For a generalized Hadamard matrix H, l2-Rig2H (r) = n(n − r).
Lemma 3.10. For a generalized Hadamard matrix H, RH(r,θ) ≥ n2(n − r) .
(θ + 1)(2n + r(θ − 1))
(1) Ifr(θ−1)≤n,thenRH(r,θ)≥n(n−r)/3(θ+1).
In particular,
(2) If r(θ − 1) ≥ n, then RH(r,θ) ≥ n2(n − r)/3r(θ2 − 1).
Proof. We will apply the Hoffman–Wielandt inequality (or the argu- ment from the proof of Lemma 3.5) to H and a scaled version βB of the altered matrix B. Since rank(B) ≤ r, σi(B) = 0 for all i > r. Thus,
􏰕n i=r+1
∥H−βB∥2F ≥
σi2(H)=n(n−r).

3.2 Variants of Rigidity 47 On the other hand, denoting the number of changes, i.e., wt(H − B),
by R,
where the inequality above follows from β ≥ 0 (which we can assume w.l.o.g.) and |bij − hij | ≤ θ.
Combining this upper bound with the lower bound ∥H − βB∥2F ≥ n(n − r),
n(n − r) − (1 − β)2n2 R≥ (1+βθ)2 −(1−β)2 .
Choosing β = r/n and manipulating, we get
R ≥ n2(n − r) .
(θ + 1)(2n + r(θ − 1))
Now, (1) and (2) simply follow from this inequality using r(θ − 1) ≤ n
and r(θ − 1) ≥ n, respectively. Remark 3.2.
• de Wolf [26] proves a lower bound very similar to the one in the lemma above using quantum arguments. As he observes, this bound captures the bound from [57] as (1) and the bound from [42] as (2).
• Pudla ́k [75] uses determinantal arguments to give a general lower bound, for r ≤ n/2,
􏰈 | det(A)| 􏰉2/(n−r) RA(r,θ) ≥ (n − r) rr/2 θ−O(1).
In particular, he gets a bound of RH(r,θ) ≥ θ−O(1)n(n − r) for a generalized Hadamard matrix. However, Razborov [86] explains how to obtain Pudla ́k’s bound using the Hoffman– Wielandt inequality.
∥H−βB∥2F =∥(1−β)H−β(B−H)∥2F
≤ (1−β)2(n2 −R)+(1+βθ)2R
= (1−β)2n2 +R􏰥(1+βθ)2 −(1−β)2􏰦,

48 Spectral Methods to Study Rank Robustness 3.2.2 Geometric Variants of Rigidity
Raz [83] defines a geometric rigidity, given below, that is somewhat similar to l2-rigidity (Definition 3.2). He uses bounds on this function in his remarkable superlinear lower bound on matrix multiplication in the model of bilinear circuits with bounded coefficients. Subsequently, Bu ̈rgisser and Lotz [21] use the same function to prove superlinear lower bounds for cyclic convolution, polynomial multiplication, and polyno- mial division with remainder in the same model.
Definition 3.4. For a matrix A ∈ Cm×n,
Rigr(A) := min max dist(ai,V ),
dim V =r 1≤i≤n
where ai ∈ Cm denotes the ith column of A and dist(x, V ) := minv∈V ∥x − v∥ is the l2-norm in Cm. The minimum is taken over all r-dimensional subspaces of Cm.
Geometrically, if we think of A as a set of n points in Cm, then Rigr(A) seeks to minimize the maximum distance of a point in A from an r-dimensional subspace. For this reason, we often refer to Rigr(A) as the geometric rigidity of A.
We first observe that Rigr(A) and l2-rigidity we discussed before are related as follows.
Lemma 3.11. For A∈Cm×n and for 1≤r≤m, l2-RigA(r)
l2-RigA(r) ≥ Rigr(A) ≥ √n .
Proof. First, we prove the right-hand side inequality.
Let V be an r-dimensional subspace achieving Rigr(A). Let P∗P
the projection matrix for V , i.e., for a ∈ Cm, P ∗P a gives the projec- tionofaontoV.NotethatP isanr×mmatrixandthatdist(a,V)= ∥a − P∗Pa∥. Now, define B := P∗PA, i.e., columns of B are projec- tions of columns of A onto V . Clearly, B ∈ Cm×n and, furthermore,

rank B ≤ r. Hence,
l2-Rig2A(r) ≤ ∥A − B∥2F 􏰕n
l2-Rig2A(r) = ≥
∥ai − bi∥2 dist2(ai,V )
∥ai −P∗Pai∥2 i
= nRig2r(A).
=
≤ nmaxdist2(ai,V )
i=1
To prove the left-hand side inequality, let B be a matrix achieving l2-RigA(r), i.e., l2-Rig2A(r) = ∥A − B∥2F and rankB ≤ r. Now, let V be an r-dimensional subspace containing the column space of B. Note that dist(ai,V ) ≤ dist(ai,bi) = ∥ai − bi∥. Hence, we have
􏰕n i=1
􏰕n i=1
≥ maxdist2(ai,V ) i
≥ Rig2r(A).
Corollary 3.12. For a generalized Hadamard matrix H , Rigr (H ) ≥
The “volume” of a matrix was earlier used by Morgenstern [65] for proving lower bounds for the Fourier transform. The following definition from [21] is useful in generalizing Morgenstern’s technique.
Definition 3.5. For a matrix A ∈ Cm×n and an integer r, 1 ≤ r ≤ m, (1) The r-volume Volr(A) of A is defined by
Volr(A) := max det(A∗IAI)1/2, |I |=r
where the maximum is taken over all subsets I ⊆ [n] of r columns of A and AI denotes the submatrix of A consisting of
3.2 Variants of Rigidity 49
√n − r.

50
Spectral Methods to Study Rank Robustness
columns indexed by I. Note that det(A∗IAI)1/2 is the volume
of the parallelepiped defined by the columns of AI.
(2) The rth mean square volume of A, msvr(A) is defined by
 1/2 􏰕∗
msvr (A) :=
where I and AI are as in (1) above.
|I |=r
det(AI AI ) ,
Remark 3.3. The nice property of msvr(A) defined in (2) above is that it is a unitarily invariant norm of a matrix A. In particular, it is well-known [13] that
msv2r(A) = er(σ12(A),…,σm2 (A)),
where er is the rth elementary symmetric polynomial in m variables.
The next lemma connects Volr(A) and Rigr(A): Lemma 3.13.
Volr(A) ≥ (Rigr(A))r.
Proof. We will pick columns v1 , . . . , vr from A as follows. Pick v1 to be a column with the largest length, i.e., that maximizes |a∗i ai |. In general, pick vi that maximizes the distance, among the columns of A, to the subspace ⟨v1,…,vi−1⟩. Clearly, a lower bound on Volr(A) is obtained by considering the submatrix of A given by v1,…,vr. The corresponding determinant is at least ∥v1′ ∥2 · ∥v2′ ∥2 ···∥vr′ ∥2, where vi − vi′ is the projection of vi onto ⟨v1,…,vi−1⟩ and hence ∥vi′∥ = dist(vi,⟨v1,…,vi−1⟩). By our choice of vi, ∥vi′∥ ≥ dist(vi+1,⟨v1,…,vi−1⟩) ≥ dist(vi+1,⟨v1,…,vi⟩) = ∥vi′+1∥. We conclude that Volr(A) ≥ (∥vr′ ∥)r.
On the other hand, consider the r-dimensional subspace V = ⟨v1,…,vr⟩. Clearly, Rigr(A) ≤ dist(ai,V ). By the choice of vi again,

3.3 Bounded Coefficient Complexity of Linear Transformations 51
and imagining continuing the selection for one more step, we note that dist(ai,V ) ≤ ∥vr′ +1∥ ≤ ∥vr′ ∥. Hence, we have Rigr(A) ≤ ∥vr′ ∥.
It follows that Volr(A) ≥ (∥vr′ ∥)r ≥ (Rigr(A))r proving the lemma.
3.3 Bounded Coefficient Complexity of Linear Transformations
In studying the complexity of algebraic problems such as computing linear/bilinear transformations, polynomial evaluation/interpolation, counting the number of arithmetic steps (addition, multiplication, etc.) is a natural mathematical measure. In particular, this model allows multiplications by arbitrary scalars in the ground field (e.g., num- bers of arbitrary value and precision if working over R or C). But, as Morgenstern [65] and Chazelle [24] have observed, many practical algorithms for computing linear transformations such as FFT do not use multiplications by scalars of unbounded value. This implies that in the underlying linear circuits, the coefficients used at each gate com- puting a linear transformation are bounded by a constant. Morgenstern introduced the model of bounded coefficients and showed that any lin- ear circuit with bounded coefficients computing the Fourier transform must have size Ω(nlogn). Since then, several authors [24, 57, 67, 75] have studied bounded coefficient complexity of linear transformations. In the same vein, bounded coefficient complexity of bilinear transfor- mations has been studied in [21] (cf. Section 3.4) and [83].
Morgenstern used determinantal arguments to prove an Ω(nlogn) lower bound on the bounded coefficient complexity of the Fourier transform. We will give a more general proof of the statement based on [83]. We will also give a proof due to Pudla ́k which also uses bounds on the determinant, but in a different way from Morgenstern. His proof has the nice feature that it gives the strongest bounds known to date on constant depth circuits as well. The main lemma for his result is the following.
Lemma 3.14. Let A ∈ Cn×n. Suppose, A = A1 ···Ak (note: we place no restrictions on the dimensions of the Ai except for the obvious ones

52 Spectral Methods to Study Rank Robustness for the multiplications to make sense). Then,
􏰈∥A1∥2 􏰉n/2 􏰈∥Ak∥2 􏰉n/2 |det(A)| ≤ F ··· F
Proof. The proof is by induction on k. The base case k = 1 fol- lows from Hadamard inequality and the AM-GM inequality. So, con- sider k > 1 and assume that A1 ∈ Cn×l and A2 ∈ Cl×l′ , where w.l.o.g. l ≥ n (if l < n, then A is singular and there is nothing to prove). Furthermore, the rows of A1 must be linearly independent. There exists a unitary matrix Q ∈ Cl×l that maps the span of rows of A1 into the span ⟨e1,...,en⟩. Let B1 := A1Q and B2 := Q∗A2. Clearly, A = B1 · B2 ···Ak. Since the columns n + 1,...,l of B1 are all 0’s (by definition of Q), we can retain the above equality if we replace rows n + 1,...,l of B2 by all 0’s rows. It follows that A = B1′ · B2′ ···Ak, where B1′ (B2′ ) is the n × n (n × l′) matrix obtained by omitting the last l − n columns (rows) from B1 (B2). By applying the induction hypothesis to B2′ ···Ak, we obtain the upper bound 􏰈∥B′ ∥2 􏰉n/2 􏰈∥A ∥2 􏰉n/2 |det(B2′···Ak)|≤ 2 F ··· k F . Also from Hadamard inequality and the AM-GM inequality, |det(B1′ )| ≤ 􏰃∥B1′ ∥2F /n􏰄n/2. Combining these inequalities with the facts that det(A) = det(B1′ )det(B2′ ···Ak), ∥A1∥2F = ∥B1∥2F = ∥B1′ ∥2F , and ∥A2∥2F = ∥B2∥2F ≤ ∥B2′ ∥2F , we complete the proof of the lemma. In a synchronous circuit, edges go from a given level only to the next, i.e., all paths from inputs to outputs have the same length. Thus, a synchronous linear circuit may be expressed as a product of as many matrices as its depth. Pudla ́k [75] proves the following. Theorem 3.15. The number of edges S in any synchronous depth-d linear circuit over C using coefficients bounded in absolute value by c computing the linear transformation x 􏰶→ Ax given by A ∈ Cn×n must nn . nn satisfy 3.3 Bounded Coefficient Complexity of Linear Transformations 53 S ≥ c−2dn|det(A)|2/dn. In particular, for a generalized Hadamard matrix H, we get alower bound of c−2dn1+1/d. Remark 3.4. When the depth is O(log n), this gives an Ω(n log n) lower bound, when restricted to synchronous circuits, for a general- ized Hadamard matrix and in particular for the Fourier transform. This matches, asymptotically when c is a constant, the best known algorithm for computing the Fourier transform in the bounded coeffi- cient model of linear circuits of logarithmic depth. The proof of Mor- genstern, discussed later, however, gives a much better lower bound of logdet(A)/log2c with no assumptions on the depth. Proof. A depth-d synchronous circuit can be viewed as consisting of d layers of intermediate nodes with li nodes on level i (inputs at level 0, outputs at level d). The edges from level i to level i + 1 define an li ×li+1 matrix Ai+1, where l0 =ld =n. We thus have A=A1···Ad. We apply Lemma 3.14 to this factorization of A. Now, note that if si is the number of edges from level i − 1 to level i, then, ∥Ai∥2F ≤ sic2. Using this in Lemma 3.14, we get 􏰈c2ds ···s 􏰉n/2 |det(A)| ≤ 1 d nd 􏰈c2􏰉dn/2􏰈s +···+s 􏰉dn/2 ≤1d nd 􏰈c2S􏰉dn/2 ≤dn . Here, we used the that S = s1 + ··· + sd. The theorem readily follows from this last inequality. The bound from the theorem is minimized (as a function of d) for d = 2 ln det A/n and we get the following. 54 Spectral Methods to Study Rank Robustness Corollary 3.16. Any synchronous linear circuit with coefficients bounded by c computing the linear transformation x 􏰶→ Ax must have at least 2elndetA/c2 edges. Both Raz [83] and Bu ̈rgisser and Lotz [21] also generalize the classi- cal determinant-based technique [65] for proving lower bounds on linear circuits with bounded coefficients and express the lower bound in terms of the rigidity functions given in Definition 3.5. Theorem 3.17. Let C be a linear circuit with coefficients bounded by θ ≥ 1, computing a linear transformation x 􏰶→ A∗x. Then, for 1 ≤ r ≤ n, (1) size(C) = Ω(log2θ Volr(A)). (2) size(C) = log2θ msvr(A) − O(n). Proof. Let s := size(C). Sort the gates of C in a topological order and let gi denote the ith gate in this order where g−n+1,...,g0 are the input nodes and g1,...,gs are the linear gates such that gi = λi1gi1 + λi2gi2 with i1,i2 < i in this order. Each gi clearly computes a linear combina- tion zi := vi∗x of the inputs x ∈ Cn. We will slightly abuse the notation and let C denote the matrix in Cn×(s+n) whose ith column is vi. By induction on j, we prove that Volr(Cj) ≤ (2θ)j for the submatrix Cj given by the first n + j columns of C. Since A is a submatrix of C, we have Volr(C) ≥ Volr(A) and we are done. In the base case, j = 0, we have the identity matrix and the claim is trivial. For j>0, let us observe that vj =λkvk +λlvl for k,l 0. Raz uses this to show the existence of a matrix Y such that: (i) Rigr(Y ) = Ω(√n) for r = Ω(n), and (ii) ∀j, |L′j(Y )| ≤ O(nε log n) for 0 < ε < 1. To see the crucial role of this matrix, observe that if we fix Y , the circuit B com- putes a linear transformation on the matrix X and in fact becomes a linear circuit BY . Moreover, it is easy to see that the linear transforma- tion computed by this circuit is given by I ⊗ Y and a lower bound on Rigr(Y)from(i)impliesoneonRignr(I⊗Y).So,ifBY weretoinvolve only bounded coefficients, then we could use the results of the previous subsection to derive a lower bound on size(BY ) and hence on size(B). However,eventhoughBusesboundedcoefficients,itsrestrictionBY in general need not use bounded coefficients since the values L′j (Y ) used by the multiplication gates in B (which become the scalars used in BY when we fix Y ) can be unbounded. Here, Raz employs a neat trick using (ii). First, note that multiplication by a scalar θ can be replaced by O(logθ) additions followed by multiplication by a scalar of absolute value at most 1. This by itself, however, cannot be directly applied since we will have to do this for all multiplication gates which might be as many as the size of B itself. Instead, Raz does the following: Let θ be the maximum (in absolute value) among all the scalars |L′j(Y)| to be multiplied. Replace the scalar multiplication by L′j (Y ) at the inputs to the third part with a multiplication by L′j (Y )/θ, which is now bounded by 1 in absolute value. Now, (effectively) multiply each of the outputs (there are only n2 of them) by the scalar θ using O(logθ) additions and a multiplication by a scalar of value at most 1. Since θ = O(nε log n), this increases the size of the resulting circuit by at most c1n2 logn for 58 Spectral Methods to Study Rank Robustness some constant c1 > 0. If we can show that the linear circuit BY is of size at least c2n2 logn for some constant c2 > c1, then we can conclude that the circuit B must have size at least (c2 − c1)n2 logn, thus proving the main lower bound. By choosing the parameters judiciously, we can achieve this goal.
Theorem3.19. Let C be a bounded coefficient bilinear cir- cuit computing the multiplication of two matrices in Cn×n. Then, size(C) = Ω(n2 logn).
Proof. We use the notation L and L′ as above. For some constants ε1 and ε to be determined later, if Rigε1n2 (L′) ≥ nε, then by Corollary 3.18, we obtain that the linear part on Y itself is of size at least εε1n2 logn and we are done. Hence, we may henceforth assume that Rigε1n2 (L′) < nε. Let c1 := ε1ε. Referring ahead to Lemma 3.20, we know that in this case, there exists a Y ∈ Cn×n such that (1) Forall1≤j≤k,|L′j(Y)|≤δnεlognforsomeδthatdepends on ε, ε1. √ (2) For constants ε3 and α, Rigε3n(Y ) ≥ α n. Fix the second input to C to be the matrix Y given by Lemma 3.20. The multiplication gates in the middle layer now become scalar mul- tiplications by L′j(Y ) of the linear forms Li(X) of the variable matrix X. Let θj :=L′j(Y) for 1≤j≤k and θ:=maxj|θj|. Replace each scalar multiplication by θj with one by θj/θ. Then, multiply each output of the circuit by θ where this is accomplished by at most logθ + 1 repeated additions and a final multiplication by a scalar of absolute value at most 1. Let C′ be the resulting circuit. Clearly, C′ is a linear circuit equivalent to C restricted to the fixed second matrix Y . Moreover, C′ uses scalar multiplications bounded in abso- lute value by 1. It is easy to see that C′ computes a linear trans- formation on X ∈ Cn2 given by the matrix I ⊗ Y ∈ Cn2 ×n2 . Finally, since there are n2 outputs and, by (1), θ ≤ δnε log n, size(C′) is bounded above by size(C) + n2(2logθ + 3) ≤ size(C) + c3n2 logn for a constant c3. 3.4 Bounded Coefficient Complexity of some Bilinear Transformations 59 Our goal is now reduced to proving a lower bound on the linear circuit C′. It is easy to see that Volr·n(I ⊗ Y ) ≥ (Volr(Y ))n. By (2), and Lemma3.13,Volr(Y)≥(α√n)r forr=ε3n.Hence,Volε n2(I⊗Y)≥ √23 (α n)ε3n . Finally, by Theorem 3.17, size(C′)≥logVolε3n2(I ⊗Y)≥ε3n2log(α√n)≥c4n2logn, for a constant c4. By choosing the parameters, we can ensure that c4 > c3. It follows that size(C) ≥ size(C′) − c3n2 log n ≥ cn2 log n.
Lemma 3.20. Let L′1,…,L′k be linear forms on Cn2 as above and L′ ∈ Cn2×k be the matrix with L′j as its columns. Fix parameters ε, ε1, c1 and c2. Then, there exists a matrix Y ∈ Cn×n (also viewed as a vector in Cn2 ) such that
(1) Foreachj,1≤j≤k,
|L′j(Y )| ≤ c1 Rigεn2(L′)√2lnk + 4.
(2)
Rig (Y)≥c√n. ε1n 2
Proof. Let R := Rigεn2 (L′). This means there exists a subspace V ⊂ Cn2 such that for all j, dist(L′j,V ) ≤ R. Let V ⊥ be the orthogonal complement of V . Let L′j,V and L′j,V ⊥ be the projections of the vector L′j on V and V ⊥, respectively. It follows that ∥L′j,V ⊥ ∥ ≤ R for all j.
We prove the existence of Y through a probabilistic argument. Choose a matrix W ∈ Cn×n by picking each entry independently from the Gaussian distribution with expectation 0 and variance 1, denoted N(0,1). Let W = WV + WV ⊥ be the decomposition of W by project- ingalongV andV⊥.WedefineY :=WV⊥,tobetheprojectionofW on V ⊥.
Note that L′j(Y)=⟨L′j,Y⟩ when treating L′j as a vector in Cn2. Since Y ∈ V ⊥,
⟨ L ′j , Y ⟩ = ⟨ L ′j , V ⊥ , Y ⟩ = ⟨ L ′j , V ⊥ , W ⟩ .

60 Spectral Methods to Study Rank Robustness
It is well-known that if independent random variables Xi are dis-
tributed according to N(μi,σi2), then 􏰂ti=1 λiXi is distributed accord-
ing to N(􏰂 λiμi,􏰂 λ2σ2) and that if X is distributed according to i iii
N(μ,σ2), then
􏰽2 e−θ2/2 Pr[|X −μ|>θσ]< π θ . Since entries of W are independently distributed according to N (0, 1), it follows that ⟨L′j,V ⊥ , W ⟩ is distributed according to N(0,∥L′j,V ⊥∥2). Recall that ∥L′j,V ⊥∥ ≤ R. Hence, 􏰝 ′ √ 1e−2 Pr[|⟨Lj,V⊥,W⟩|>R 2lnk+4]< π(lnk+2) k . = 1 − o(1). Thus, a matrix Y satisfying (1) exists with probability 1 − o(1). Toward proving (2), we first recall the well-known fact [92] that σ1(W) converges to 2√n almost surely. Hence, with probability 1 − o(1), σ1(W) ≤ (2 + ε)√n, for a small enough ε > 0.
The above holds for each j, 1 ≤ j ≤ k. Hence, we will have
|⟨L′j,V ⊥,W⟩| ≤ R√2lnk + 4 for all j with probability 1 − √ e−2 π(ln k+2)
Secondly, we recall that if Xi ∼ N (0, 1), 1 ≤ i ≤ m, are i.i.d. random
variables, then Q = 􏰂mi Xi2 is distributed according to the chi-square
density with m degrees of freedom: f(x,m) = 2−m/2 xm/2−1 exp(−x/2). Γ(m/2)
In particular, Q has expectation m, variance 2m, and moment generat- ing function MQ(t) = E[etQ](1 − 2t)−m/2 for 2t < 1. Moreover, in the limit, Q approaches normal distribution . Hence, Pr[Q < (1 − δ)m] < ε for suitable ε and δ. It follows that ∥W ∥2F ≥ (1 − δ)n2 with high probability. By Lemma 3.5 and Fact 3.3 􏰕n l2-Rig2r(W) ≥ i=r+1 σi2(W) = ∥W∥2F − 􏰕r i=1 σi2(W). By the above estimates, with high probability ∥W ∥2F ≥ (1 − δ)n2 and σ1(W) ≤ (2 + ε)√n. For r ≤ cn for sufficiently small c, we then have, with high probability, l2-Rig2r(W) ≥ (1 − δ)n2 − c(2 + ε)2n2 ≥ c′1n2. 3.4 Bounded Coefficient Complexity of some Bilinear Transformations 61 Now, we use Lemma 3.11 to conclude that Rigr (W ) ≥ We still have to prove a lower bound on Rigr(Y). Recalling W =Y +WV, we note that for any rank-r matrix D, ∥Y −D∥F ≥ ∥W −D∥F −∥W −Y∥F =∥W −D∥F −∥WV∥F. We will show that, with high probability, ∥WV ∥2F ≤ (1 + ε)dim(V )n ≤ c′2n2 when l -Rig (W)/√n ≥ c √n. 2r1 dim(V ) ≤ r = cn for sufficiently small r. This will prove that ∥Y −D∥ ≥ c n−c n=c n and by Lemma 3.11, Rig (Y)≥c √n. F123 r3 Since (1) and (2) each holds with high probability(say, more than 3/4 each), we have shown that a matrix Y satisfying both (1) and (2) exists. It remains to observe that ∥WV ∥2F is at most (1 + ε)rn with high probability. Note that WV is a projection of the Gaussian matrix W onto V . By choosing an orthonormal basis for V , we can ensure that each entry of WV is a linear combination of Gaussian variables according to N (0, 1) by a unit vector, i.e., by λi where 􏰂 λ2 = 1. ii Thus, WV is a collection of rn variables according to N(0,1). By the same arguments that we used for W , we conclude that ∥WV ∥2F has the chi-square distribution with rn degrees of freedom with mean rn and variance 2rn, and hence with high probability is at most (1 + ε)rn. 3.4.2 Bounded Coefficient Complexity of Convolution Bu ̈rgisser and Lotz [22] prove superlinear lower bounds on the bounded coefficient complexity of the bilinear forms defined by (cyclic) con- volution, polynomial multiplication, and polynomial division with remainder. Definition 3.7. Let a = (a0,...,an−1) and b = (b0,...,bn−1) be vectors in Cn. The cyclic convolution c of a and b is defined by 􏰕 i+j=k mod n ck = aibj, 0≤k≤n−1. Clearly, cyclic convolution is a bilinear form Cn × Cn → Cn. Convolu- tion of a and b is often denoted by a∗b. 62 Spectral Methods to Study Rank Robustness If f = 􏰂n−1 aixi and g = 􏰂n−1 bjxj are polynomials with a and b i=0 j=0 as coefficient vectors, then c gives the coefficients of the product h (also called convolution) of f and g in the ring C[X]/(Xn − 1). The lower bounds on polynomial multiplication and division will be consequences, via well-known reductions, from those on convolution. The overall structure of the proof for convolution is similar to the one in Section 3.4.1 for matrix multiplication, so we will be terse. Note that if we fix a, the convolution a ∗ b is just the linear trans- formation Ab given by the circulant matrix  a0 a1···an−1 A := circ(a) := an−1 a0 ··· an−2 ··· ··· ··· ··· a1 a2 ··· a0 Consider a bilinear circuit computing a ∗ b. Let the first part (cf. Definition 3.6) compute linear forms Li(a), 1 ≤ i ≤ k, in a. Fix r, 1 ≤ r ≤ n. Let Rign−r(L1,...,Lk) =: R. We will fix a (by a probabilistic argument) to satisfy the following two properties: (1) ∀i,1≤i≤k,,|Li(a)|≤R√2lnk+4. (2) For this a, the bounded coefficient linear circuit complexity of x 􏰶→ Ax, where A is the circulant matrix defined above is at least (1/2)(n − r) log n − cn for some constant c. Given such an a, we can argue as in the proof of Theorem 3.19 and obtain the following: Theorem 3.21. The bounded coefficient bilinear complexity of con- volution is at least (1/12)n log n − O(n log log n). To prove the existence of a satisfying (1) and (2), let V ⊆ Cn be an n/2-dimensional subspace achieving Rign/2(L1,...,Lk) = R and V ⊥ be its orthogonal complement. As before we will pick a to be a standard Gaussian random vector in V ⊥. 3.4 Bounded Coefficient Complexity of some Bilinear Transformations 63 Lemma 3.22. Let a ∈ Cn be a standard Gaussian vector from V ⊥ of dimension r chosen as above. Let C be a linear circuit with bounded coefficients computing x 􏰶→ Ax, where A is the circulant matrix given by a. Then, 􏰊1􏰋1 Pr size(C) ≥ 2rlogr − cr > 2,
where c ≈ 3.73.1
Proof. We will derive a lower bound on the mean square r-volume, msvr(A), of the circulant matrix A and use (ii) of Theorem 3.17.
Let D = (ζij)n−1 , where ζ = exp(2πi/n) is a primitive nth root of i,j =0
unity, be the Discrete Fourier Transform (DFT) matrix. Recall that the eigenvalues of A are given by the Fourier transform Da of the sequence a. In particular, AD = diag(λ0,…,λn−1)D, where λ = Da.
Since 1 D is a unitary matrix, AA∗ = diag(λ2,…,λ2 √n 0 n−1
that |λi| are the singular values of A. Hence 2􏰕􏰖2
) and it follows
(3.1)
|α |2. i
Lemma 3.23. Let Z = (Z1,…,Zr) be a centered Gaussian vector in Cr. Let Σr := (E[ZiZj])1≤i,j≤r be the covariance matrix of Z. Let
1 The value of c is computed from the expectations of logX2 and log2(X2 + Y 2), where X and Y are independent standard normal random variables. We refer to [22] for details.
msvr(A) =
Let α:=(1/√n)λ = (1/√n)Da. So, msv2(A) = nr 􏰂
|I|=r i∈I
r |I|=r
Since a is a standard Gaussian vector from V ⊥ and (1/√n)D is uni- tary, α is also a standard Gaussian vector in some subspace U of the same dimension r. We now use a deviation inequality for the products of squares of Gaussian random variables. We skip the proof and refer
the reader to [22].
|λi| .
􏰹
i∈I

64 Spectral Methods to Study Rank Robustness
δ (≈ 0.02) be a fixed a constant.2 Then,
E[|Z1|2 ···|Zr|2] ≥ detΣr, (3.2)
Pr[|Z1|2 ···|Zr|2 ≥ δr detΣr] > 1. (3.3) 2
Since α is a standard Gaussian vector in the subspace U, there exists an orthonormal basis b1,…,br ∈ Cn such that α = Bβ, where B ∈ Cn×r with bi as columns and β ∈ Cr is a vector of independent standard Gaussian random variables. For I ⊆ [n], |I| = r, let us denote αI = (αi)i∈I . We then have αI = BI β, where BI is the (square) submatrix given by rows with indices in I. Furthermore, the covari- ance matrix ΣI = E[αIαI∗] = BIBI∗ by the independence of βi. Thus, detΣI = |detBI|2.
By the Binet–Cauchy formula,
But BB∗ = (⟨bi,b∗j⟩)1≤i,j≤r and bi are detBB∗ = det(⟨bi,b∗⟩) = 1. It follows that 􏰂
∗􏰕∗􏰕2
|detBI|. orthonormal.
In particular, we now obtain from (3.1), that the circulant matrix A has, with probability at least 1/2,
2 r r􏰆n􏰇−1 􏰈δr􏰉r msvr(A)≥nδ r ≥ e ,
using the inequality 􏰃n􏰄 ≤ (en/r)r. r
Finally, using Theorem 3.17 and that δ is a constant, we con- clude that size(C) ≥ (1/2)rlogr − cr for a suitable constant c with probability at least 1/2.
2 The value of δ again arises from considerations alluded to in footnote 1.
detBB = detBIdetBI=
|I |=r |I |=r
j |I|=r 􏰃 􏰄−1 there must exist an I, |I| = r such that detΣI = |detB|2 ≥ nr . We apply Lemma 3.23 to αI for such an I and conclude that
Pr[􏰹 |αi|2 ≥ δr􏰃n􏰄−1] > 1/2. i∈I r
Hence, |detBI|2 = 1. Hence,

4
Sign-Rank and Other Complexity Measures of Sign Matrices
In Sections 2 and 3, we considered rigidity and other robustness measures for arbitrary matrices over a field. But in applications to several lower bound problems, we need to consider robustness mea- sures of Boolean matrices (entries being 0–1 or ±1). By considering a ±1-matrix (also called a sign matrix) over R, several complexity measures have been defined in the literature and lower/upper bounds on such measures are used to derive interesting consequences in com- plexity theory. Such measures include sign-rank, margin complexity, discrepancy, etc., and the applications include communication com- plexity, learning complexity, Boolean and threshold circuit complex- ity and so on. In this section, we study complexity measures of sign matrices and several of their applications. In Section 5, we focus on applications of various measures of Boolean matrices to communication complexity.
4.1 Sign-Rank
Let us begin with the definition.
65

66 Sign-Rank and Other Complexity Measures of Sign Matrices
Definition 4.1. For A ∈ {−1,+1}m×n,
sign-rank(A) := min{rank(B) : ∀i,j sign(bij) = aij}.
Thus sign-rank(A) measures the robustness of the rank of A under sign-preserving changes (every entry is allowed to be changed).
The sign-rank of a matrix has the following elegant geometric interpretation.
Definition 4.2. Two sets of vectors X = {x1,…,xm} and Y = {y1,…,yn} with xi,yj ∈ Rd are said to realize the matrix A ∈ {−1,+1}m×n in dimension d if ∀i,j sign(⟨xi,yj⟩) = aij.
The pair (X,Y) is also called a d-dimensional linear arrangement for matrix A.
It is easy to see that sign-rank(A) is exactly equal to the minimum dimension d of a realization of A. By thinking of A as the incidence matrix of a set system (rows correspond to elements of a universe, columns to subsets, and an element is in a subset iff the corresponding entry is a −1), this is very closely related to a problem about geometric realizations of set systems as defined in [71] and [2]. In such a real- ization, the elements of the universe are represented by points xi ∈ Rd and the sets in the set system by hyperplanes through the origin given by their normal vectors yj ∈ Rd. The ith element is in the jth set if and only if the corresponding vector xi is on the “positive” side of the hyperplane with normal yj. The goal is to realize a given set system in as small a dimension d as possible.
The definition of sign-rank first appeared (with the geometric inter- pretation) in the paper by Paturi and Simon [71] who introduced the model of unbounded error probabilistic communication complexity and showed that the amount of communication to compute a func- tion f : {0, 1}t × {0, 1}t → {0, 1} in this model is exactly captured by sign-rank(Af ), where Af (x,y) = (−1)f(x,y).
Unlike in many cases, it is not even clear that almost all matri- ces have high sign-rank. Alon et al. [2] prove the non-trivial result that

4.2 Forster’s Lower Bound 67
sign-rank(A) = Θ(n) for almost all A ∈ {−1,+1}n×n. Their proof relies on results from real algebraic geometry, e.g., by Warren [102], on the number of sign patterns of a set of polynomials; a simple linear alge- braic proof of Warren’s theorem can be found in [89]. Since then, find- ing an explicit matrix with sign-rank more than logarithmic remained a challenge. Then, Forster [28] achieved a breakthrough by proving that any n × n Hadamard matrix must have sign-rank at least Ω(√n). Forster’s lower bound implies asymptotically optimal lower bounds on the communication complexity of the inner product mod-2 function in the Paturi–Simon model and also upper bounds on maximal mar- gins of hyperplane classifiers in learning theory. His lower bound and some of its generalizations are also applied [29] to derive lower bounds on threshold circuits of depth-2 and OBDD’s. These applications are discussed in detail in Section 4.5.
4.2 Forster’s Lower Bound
Let xi,yj ∈ Rd be a minimal dimensional realization of the matrix A. We begin with two simple observations: (i) we can assume w.l.o.g. that xi and yj are unit vectors in Rd, and (ii) since ∀i,j sign(⟨xi,yj⟩) = aij is an open condition, we can assume w.l.o.g. that the vectors xi (and yj) are in general position, i.e., any d of them are linearly independent.
We will make use of the following linear algebraic lemma. We defer its proof to Section 4.2.1.
Lemma 4.1. Let X = {x1,…,xm} be vectors in Rd in general posi-
tion. Then, there is a nonsingular transformation B of Rd such that
x􏰲 := Bx /∥Bx ∥ satisfy iii
sign⟨x􏰲,y􏰲⟩=sign⟨Bx,(BT)−1y ⟩=sign⟨x,y ⟩. ijijij
􏰕m m x􏰲x􏰲T= I.
ii dd i=1
We observe that in any realization of A by xi,yj, we can replace them w.l.o.g. by x􏰲 as given in Lemma 4.1 and y􏰲 defined by y􏰲 :=
ijj (BT )−1yj/∥(BT )−1yj∥. Indeed,

68 Sign-Rank and Other Complexity Measures of Sign Matrices
Hence, we can assume, w.l.o.g., that the xi are nicely balanced in the sense that they satisfy
􏰕m m
xixiT = d Id. (4.1) i=1
Theorem 4.2. For A ∈ {−1,+1}m×n, sign-rank(A) ≥ √mn/∥A∥. In other words, the minimum dimension A can be realized in is at least √mn/∥A∥.
Proof. Let xi,yj ∈ Rd be vectors in a minimal dimensional realization of A. Let Hj be the hyperplane passing through the origin in Rd with yj as its normal vector. Let Pi be the point in Rd given by xi.
We are interested in bounding the quantity
􏰕n􏰎􏰕m 􏰏2􏰕n􏰎􏰕m 􏰏2
D := dist(Pi,Hj) = j=1 i=1
|⟨xi,yj⟩| . j=1 i=1
As discussed above, w.l.o.g., we can assume that the xi and the yj are unit vectors in general position and that the xi satisfy (4.1).
Fix a j. Now,
􏰕m 􏰕m
|⟨xi,yj⟩| ≥
i=1 i=1 􏰎􏰏
=
􏰕m i=1
􏰕m i=1
(⟨xi,yj⟩)2 since xi,yj are unit vectors.
y jT x i x Ti y j = y jT = yjT mId yj by (4.1)
d
= m y jT I d y j = m . dd
x i x Ti
y j
It follows that D ≥ nm2/d2. We will next show that D ≤ m∥A∥2. Combining these two bounds, we obtain that d ≥ mn/∥A∥ and the theorem is proved.
Since xi and yj realize A, |⟨xi,yj⟩| = aij⟨xi,yj⟩. Hence, for any j,
􏰼 􏰾􏰾􏰾􏰕m 􏰾􏰾􏰾
􏰕m i=1
􏰕m i=1
􏰻 􏰕m
yj, aijxi
≤ 􏰾􏰾
|⟨xi,yj⟩| =
by Cauchy–Schwartz, since yj is a unit vector.
aij⟨xi,yj⟩ =
aijxi􏰾􏰾,
i=1
i=1

Thus,
􏰕n􏰾􏰾􏰾􏰕m 􏰾􏰾􏰾2􏰕n􏰎􏰕m 􏰏􏰎􏰕m 􏰏 D≤ 􏰾􏰾 aijxi􏰾􏰾= akjxTk aljxl
j=1 i=1 j=1 k=1 l=1 􏰕T􏰕n 􏰕 T
= (xk xl) akjalj = ⟨xk,xl⟩ · (AA )kl. 1≤k,l≤m j=1 1≤k,l≤m
Observe that ∥A∥2Im − AAT and (⟨xk,xl⟩)1≤k,l≤m are both positive-semidefinite matrices. We now use the following theorem (see, e.g., [39, Corollary 7.5.4]) characterizing positive-semidefinite matrices.
Fact 4.3 (Fejer’s Theorem). A matrix P ∈ Rm×m is positive-
4.2 Forster’s Lower Bound 69
semidefinite if and only if for all positive-semidefinite matrices Q ∈
Rm×m, 􏰂 PklQkl ≥ 0. 1≤k,l≤m
Hence, 􏰂 1≤k,l≤m
⟨xk,xl⟩ · (∥A∥2Im − AAT )kl ≥ 0. Using this we continue with our upper bound on D:
D ≤ 􏰕 ⟨xk,xl⟩ · ∥A∥2(Im)kl = ∥A∥2 􏰕 ⟨xk,xk⟩ 1≤k,l≤m 1≤k≤m
= ∥A∥2m since xk are unit vectors. This completes the proof.
Corollary 4.4. For an n × n Hadamard matrix H , sign-rank(H ) ≥ √n.
4.2.1 Proof of Lemma 4.1
For a vector x ∈ Rd (written as a column vector), let M(x) denote the rank-1 matrix xxT in Rd×d. For a set of vectors X = {x1,…,xm} ⊆ Rd, let M(X) denote the d × d matrix 􏰂mi=1 M(xi) = 􏰂mi=1 xixTi . Define a set of vectors X to be nice if M(X) = (m/d)Id.
For a set of vectors X, let N(X) denote the normalizations of vectors in X, i.e., N(X) = {xi/∥xi∥ : xi ∈ X}. For a linear transformation B

70 Sign-Rank and Other Complexity Measures of Sign Matrices
on Rd, B(X) denotes the set of images of elements of X under B. Then, the lemma can be restated as follows: If X is a set of vectors in Rd in general position, then there exists a nonsingular linear transformation B on Rd such that N(B(X)) is nice.
The proof is based on the following two lemmas.
Lemma 4.5. Let X ⊆ Rd be a set of m ≥ d vectors in general position. Either the matrix M(X) is equal to (m/d)Id or there is a nonsingular linear transformation B on Rd such that the matrix M(N(B(X))) has its smallest eigenvalue strictly greater than that of M(X).
Proof. If |X| = d, then by applying a nonsingular linear transforma- tion B, we can ensure that M(X) = Id and the lemma is proved. So, let us assume that |X| > d.
Let λ be the smallest eigenvalue of M(X). Note that M(X) is a pos-
itive semidefinite matrix and hence all its eigenvalues are nonnegative.
In fact, it is easy to see that λ is at least 1 (for instance, map, say,
the first d vectors onto the canonical basis vectors ei of Rd). Fur-
ther, the sum of eigenvalues of M(X) is tr(M(X)) = 􏰂m 􏰂d x2 = 􏰂m 2 i=1 j=1 i,j
i=1 ∥xi∥ = m. It follows that if λ ≥ m/d, then all its eigenvalues are m/d. Hence, by applying a nonsingular linear transformation, we can write M(X) as (m/d)Id and we are again done.
So, we can assume that λ < m/d. Let s be the multiplicity of λ. We will show that by applying a nonsingular linear transformation B, we can ensure that the multiplicity of λ as the smallest eigenvalue of M(N(B(X))) is strictly smaller than s. By repeatedly applying this step, we arrive at a matrix having λ as its smallest eigenvalue with multiplicity zero and no smaller eigenvalues; thus proving the lemma. We can assume without loss of generality that M(X) is diagonal- ized: M(X) = diag(λ1,λ2,...,λd), where 1 ≤ λ = λ1 = λ2 = ··· = λs < λs+1 ≤ ··· ≤ λd. Let us define B := diag(λ−1/2,...,λ−1/2) and consider 1d the multiplicity of λ as an eigenvalue of C:=M(N(B(X)))= 􏰕m1 T i=1 ∥Bxi∥2(Bxi)(Bxi) . 4.2 Forster’s Lower Bound 71 Note that ∥Bxi∥2 = 􏰂dj=1 x2i,j/λj ≤ 1/λ since λj ≥ λ and ∥xi∥2 = 1. Also, 􏰂mi=1(Bxi)(Bxi)T =Id by our assumption on M(X) and definition of B. It follows that C − λId is positive semidefinite. Hence, all eigenvalues of C are at least λ. We need to show that the multiplicity of λ as an eigenvalue of C is strictly smaller than s. We will do this by showing that the rank of C − λId is strictly larger than d − s. To this end, we note that in the inequality ∥Bxi∥2 ≤ 1/λ observed above, equality holds if and only if xi is an eigenvector of M(X) for eigenvalue λ. Since λ has multiplicity s, there are at most s vectors xi ∈ X such that this equality holds. For the remaining m − s vectors from X, this inequality is strict. Note that s < m since λ < m/d and the sum of the eigenvalues of M(X) is m. Since 􏰕m􏰈1 􏰉 ∥Bxi∥2 − λ (Bxi)(Bxi)T , for at least m − s terms in the sum, the coefficients are strictly positive. Moreover, the vectors xi (and hence Bxi) corresponding to these terms are either linearly independent (if there are fewer than d of them) or in general position (if there are more than d of them). Hence, C − λId is a positive linear combination of at least min{m − s,d} rank-1 matrices of the form yiyiT . It follows that the rank of C − λId is at least min{m − s, d}. Since m > d, we conclude that the rank of C − λId is strictly larger than d − s, unless it is already d.
Lemma 4.6. For a given finite set X ⊆ Rd of vectors in general posi- tion and an ε > 0, the set of all matrices A ∈ Rd×d such that (1) and (2) below hold is compact.
(1) ∥B∥ = 1.
(2) The smallest eigenvalue of M(N(B(X))) is least 1 + ε.
The proof of this lemma is by a simple compactness argument and we refer the reader to [28] for the details.
C − λId =
i=1

72 Sign-Rank and Other Complexity Measures of Sign Matrices
Given Lemmas 4.5 and 4.6, we can now complete the proof of Lemma 4.1.
By reasoning as in the proof of Lemma 4.5, we can see that the smallest eigenvalue of M(X) is at least 1. If M(X) is already not of the form (m/d)Id, then Lemma 4.5 shows that we can increase its smallest eigenvalue to at least 1 + ε for some ε > 0 after applying a nonsingular linear transformation B and normalizing.
The nonsingular transformation B above can be assumed to have ∥B∥ = 1 since replacing B by B/∥B∥ does not change M(N(B(X))). By Lemma 4.6, the set of all such linear transformations is compact. The smallest eigenvalue of a positive semidefinite matrix is a continuous function of the matrix. Hence, the smallest eigenvalue of M(N(B(X))) is a continuous function of B for the given X. It follows that the maximal value of the smallest eigenvalue is achieved by some B in this set. For this B, M(N(B(X))) must be of the form (m/d)Id. Otherwise, we can apply Lemma 4.5 again and find another B in this set to increase the smallest eigenvalue. This contradicts the maximality at B of the smallest eigenvalue of M(N(B(X)).
4.3 Margin Complexity, Factorization Norms, and Discrepancy
The margin of an arrangement of hyperplanes and points arises in learn- ing theory. We will discuss this application in more detail in Section 4.5.
Definition 4.3. Let X = {xi,…,xm} and Y = {y1,…,yn} be a sign realization by unit vectors of the matrix A ∈ {−1, +1}m×n as in Definition 4.2. The margin of A is then defined to be the maximum margin among all its realizations:
􏰌􏰍
margin(A) := max min|⟨xi,yj⟩| : X,Y realize A . i,j
We also define, following [55], the margin complexity mc(A) to be margin(A)−1 .
When we view xi as points and yj as normals to hyperplanes through the origin, margin(A) is the minimum distance of any point from any

4.3 Margin Complexity, Factorization Norms, and Discrepancy 73
hyperplane in the arrangement. Note that the dimension of the R-space in which the xi and yj live is not relevant for this definition.
It is easy to see that for any A ∈ {−1,+1}m×n, mc(A) is at most min{√m, √n}.
Forster’s result from Section 4.2 also proves a lower bound on the margin complexity of a sign matrix.
Theorem 4.7. For A ∈ {−1,+1}m×n, mc(A) ≥ √mn/∥A∥.
Proof. Recall the definition of D from the proof of Theorem 4.2: 􏰕n􏰎􏰕m 􏰏2􏰕n􏰎􏰕m 􏰏2
D := dist(Pi,Hj) = j=1 i=1
|⟨xi,yj⟩| . j=1 i=1
Let us consider D in a realization by xi and yj achieving the maximal margin for A.
Clearly, D ≥ n(mmargin(A))2. In that proof, we also proved the upper bound D≤m∥A∥2. Combining these, we have margin(A)≤ ∥A∥/√mn. Since mc(A) = margin(A)−1, we are done.
Linial et al. [53] relate margin complexity to some factorization norms of sign matrices and discrepancy.
Given an operator A : U → V , its factorization norm through the normed space W is obtained by writing A=XY, where Y :U →W and X : W → V , that minimizes the product of the norms of operators X and Y . An example of such a factorization norm of interest to us is when U = lm1 , V = ln∞, and W = l2.
Definition 4.4. Let A : lm1 → ln∞. Then, we define γ2(A) to be γ2(A) = min ∥Y ∥1→2∥X∥2→∞.
A=X Y
It is easy to see that for a matrix R ∈ Rm×r, ∥R∥1→2 is the largest l2-norm of a row of R and that for a matrix C ∈ Rs×n, ∥C∥2→∞ is the largest l2-norm of a column of C. Hence, we have
γ2(A) = min max∥xi∥∥yj∥, (4.2) A=XY ij

74 Sign-Rank and Other Complexity Measures of Sign Matrices
where xi, 1≤i≤m, are the rows of X and yj, 1≤j≤n, are the
columns of Y .
Definition 4.5. For a matrix A ∈ {+1,−1}m×n, the set SP(A) is
defined by
SP(A) := {B ∈ Rm×n such that ∀i,j aijbij ≥ 1},
i.e., the set of real matrices B that agree in sign with A entry-wise and have each entry of absolute value at least 1.
Lemma 4.8. mc(A) = minB∈SP(A) γ2(B).
Proof. (sketch) Given a sign realization as in Definition 4.3 achieving mc(A), identify (by abuse of notation) the set of vectors X and Y with matrices X and Y with rows xi and columns yj, respectively. Let B := mc(A)XY . Note that B ∈ SP(A) and γ2(B) ≤ mc(A). This shows that minB∈SP(A) γ2(B) ≤ mc(A). For the other direction, let B achieve the minimum on the right-hand side. Let X and Y be the matrices realizing the minimum in γ2(B). Since B ∈ SP(A), it can be seen that the rows and columns of X and Y , respectively give, after suitable normalization, a sign realization of A with a margin complexity at most γ2(B).
Corollary 4.9. mc(A) ≤ γ2(A).
We recall the definition of a dual norm.
Definition 4.6. Let A ∈ Cm×n. Let ∥ · ∥ be a norm on a vector space V over R or C. The dual norm ∥·∥∗ is defined by
∥y∥∗ := sup{|⟨x,y⟩| : ∥x∥ = 1}.
In particular, for a matrix norm ∥ · ∥ on Cm×n the dual norm ∥ · ∥∗ is defined by
∥A∥∗ := sup{|trAC∗| : C ∈ Cm×n,∥C∥ = 1},

4.3 Margin Complexity, Factorization Norms, and Discrepancy 75 where the dual is defined with respect to the Frobenius inner product,
⟨A,C⟩ = tr(AC∗), on Cm×n.
Using the dual norm γ2∗ of γ2, Linial et al. [53] obtain a better lower
bound on mc(A) than Forster’s (Theorem 4.7). Theorem 4.10. mc(A) ≥ mn/γ2∗(A).
Proof. Let B ∈ Rm×n be a matrix
Lemma 4.8. Define C := B/γ2(B) so
have the same sign pattern, ⟨C,A⟩ = 􏰂 ij
cijaij = (􏰂 ij
bijaij)/γ2(B) ≥ mn/γ2(B) since B ∈ SP(A). Hence, by definition, γ2∗(A) ≥ mn/γ2(B).
Since mc(A) = γ2(B), we are done.
The following inequality shows that the bound in Theorem 4.10 can
be better than the bound in Theorem 4.7.
Lemma 4.11. For A ∈ Rm×n, γ2∗(A) ≤ √mn∥A∥.
Proof. Let B ∈ Rm×n be such that γ2(B) = 1. Using (4.2), let X ∈ Rm×t and Y ∈ Rt×n (the value of t is not important) be such that B = XY and γ2(B) = 1 = maxij ∥xi∥∥yj∥, where xi (yj) is the ith row (jthcolumn)ofX(Y),1≤i≤m,1≤j≤n.Letx(j) bethejthcolumn of X and y(i) be the ith row of Y for 1≤i,j≤t and note that B= XY = 􏰂tj=1 x(j)y(j). Now,
with mc(A) = γ2(B) as in γ2(C) = 1. Since B and C
􏰿 􏰕 (j) (j)􏱀 ⟨A,B⟩= A, x y =
􏰕 (j) (j) = y Ax
􏰕
(j) (j) ⟨A,x y ⟩
≤ ∥A∥
by definition of ∥A∥   
j 􏰕 (j) (j)
j
􏰭􏰬􏰬 
∥x ∥∥y ∥
j
≤ ∥A∥􏰬􏰫􏰕∥x(j)∥2􏰕∥y(j)∥2 by Cauchy–Schwartz jj

76
In fact, Linial et al. exhibit a (random-like) matrix A ∈ {−1, +1}n×n for which γ2∗(A) = O(n3/2), whereas ∥A∥ = Ω(n3/4). For this A, then, Forster’s bound can give at most a lower bound on mc(A) of O(n1/4) whereas the bound from Theorem 4.10 gives Ω(n1/2).
Finally, Linial et al. use a famous inequality due to Grothendieck to relate γ2∗(A) to an operator norm of A.
Theorem 4.12 (Grothendieck’s Inequality). Let A = (aij ) be a real matrix such that |􏰂 aijsitj| ≤ 1 for every choice of reals that sat-
Sign-Rank and Other Complexity Measures of Sign Matrices
􏰭􏰬􏰬 􏰎 􏰏   = ∥A∥􏰬􏰫 􏰕∥xi∥2 􏰕∥yj∥2
ij
≤ ∥A∥√mmax∥xi∥√nmax∥yj∥ ij
= ∥A∥√mnγ2(B) = ∥A∥√mn.
ij
isfy |si|,|tj| ≤ 1 for all i,j. Then, there exists an absolute constant KG,
where 1.5 < KG < 1.8, such that 􏰅􏰅 􏰅􏰅􏰕 􏰅􏰅 􏰅􏰅 For an elementary proof of this inequality, see [14]. Lemma 4.13. For every A ∈ Rm×n, ∥A∥∞→1 ≤ γ2∗(A) ≤ KG∥A∥∞→1. Proof. Let x be such that ∥x∥∞ = 1 and ∥Ax∥1 = ∥A∥1. Consider the matrix B = sign(x)xt, where sign(x) is the column vector of signs of the coordinates of x. It is easy to see that γ2(B) = ∥x∥∞ = 1 and that ⟨A,B⟩ = sign(x)tAx = ∥Ax∥1. Hence, γ2∗(A) ≥ ⟨A,B⟩ = ∥A∥∞→1. The other direction is a restatement of Grothendieck’s inequality. aij⟨xi,yj⟩􏰅􏰅 ≤ KG, for all unit vectors xi and yj in a real Hilbert space. 􏰅􏰅 ij 4.3 Margin Complexity, Factorization Norms, and Discrepancy 77 Corollary 4.14. mc(A) ≥ mn . KG ∥A∥∞→1 Using methods similar to the foregoing, Linial et al. also prove lower bounds on sign-rank. Theorem 4.15. For any A ∈ {−1,+1}m×n, sign-rank(A) ≥ mn , and γ 2∗ ( A ) sign-rank(A) ≥ mn . Using the well-known John’s theorem [17, Section 4, Theorem 15] (also known as John’s Lemma and related to the so-called Lo ̈wner– John ellipsoids), Linial et al. also show the following relation between rank and γ2. Lemma 4.16. For any A ∈ Rm×n, γ2(A) ≤ ∥A∥2∞→1 rank(A). In particular, for a sign matrix A, 􏰛 γ2(A) ≤ rank(A). Combined with Corollary 4.9 we have the following for any m × n KG ∥A∥∞→1 sign matrix A: 􏰛 mc(A) ≤ γ2(A) ≤ rank(A). (4.3) The results for an Hadamard matrix show that these inequalities are in general tight. As we note from the above, we have the same lower bounds on sign- rank(A) and mc(A) and they are proved using very similar techniques. Hence, one wonders if there is some relation between sign-rank(A) and mc(A). The best known relation is the following based on the Johnson–Lindenstrauss lemma. 78 Sign-Rank and Other Complexity Measures of Sign Matrices Lemma 4.17. For A ∈ {−1,+1}m×n, sign-rank(A) = O(mc(A)2 log(m + n)). We will see a proof of this lemma in the context of probabilistic communication complexity in Section 5.3 (cf. Lemma 5.15). Finally, Linial and Shraibman [55] show a close relationship between margin complexity and discrepancy. Definition 4.7. Let A ∈ {−1, +1}m×n and μ be a probability distri- bution on entries of A. For a combinatorial rectangle R in A, let R+ and R− are the positive and negative entries in R, respectively. Then, discμ(A) := max|μ(R+) − μ(R−)|, where the maximum is taken over all combinatorial rectangles R in A. The discrepancy of A, denoted disc(A), is then defined as the minimum of discμ over all distributions μ. Theorem 4.18. For any A ∈ {−1,+1}m×n, 1 margin(A) ≤ disc(A) ≤ 8 margin(A). 8 Proof. (outline) The proof consists of the three main steps below. For details, see [55]. (1) Show that the margin changes by at most a factor of KG, where KG is the Grothendieck’s constant, if we replace the arbitrary unit vectors in Definition 4.3 by normalized sign vectors, i.e., xi/∥xi∥,yj/∥yj∥, where xi,yj ∈ {−1,+1}k for some k. Denote this restricted variant of margin by margin± (A). 4.4 4.4 Sign-Rank of an AC0 function 79 (2) Next, observe that disc(A) ≤ min ∥P ◦ A∥∞→1 ≤ 4 disc(A), P∈P where ◦ denotes the Hadamard product of two matrices and P denotes the set of matrices with nonnegative entries that sum up to 1. (3) Finally, express margin± in the first step as the optimum of a linear program and use Linear Programming (LP) duality to show that margin±(A) = min∥P ◦ A∥∞→1. P∈P Sign-Rank of an AC0 function Recently, Razborov and Sherstov [88] proved a lower bound of exp(Ω((logn)1/3)) on the sign-rank of an n × n matrix Af obtained by evaluating an AC0 function f on all (x,y). This is a remarkable result both due to the new techniques it uses and several interesting conse- quences. In this subsection, we review the proof of this lower bound. We discuss its applications in the next subsection. The matrix Af is a 2t3 × 2t3 ±1 matrix given by evaluating the AC0 function t t2 􏰗􏰘 i=1j=1 on all pairs (x,y) ∈ {0,1}t3 × {0,1}t3 , i.e., Af (x,y) := (−1)f(x,y). The main result from [88] is that sign-rank(Af ) = 2Ω(t). The first ingredient in the proof is a generalization of Forster’s argu- ment to matrices that may have a small number of zero entries and there is a lower bound on the magnitude of the remaining entries. The sec- ond new ingredient is the construction of the so-called smooth orthog- onalizing distributions for Boolean functions. By “masking” a Boolean function with such a distribution, the Fourier coefficients can be con- trolled. The final ingredient is the derivation of an expression for the spectral norm of a pattern matrix in terms of the Fourier transform of the Boolean function defining that matrix. We will discuss these ingredients before proving the main theorem. f(x,y) := (xij ∧ yij), (4.4) 80 Sign-Rank and Other Complexity Measures of Sign Matrices 4.4.1 A Generalization of Forster’s Argument We define the sign-rank of an arbitrary real matrix A as the minimum rank of a real matrix B such that all nonzero entries of the entry-wise product A ◦ B are strictly positive: sign-rank(A) := min{rank(B) : ∀i,j aij ̸= 0 ⇒ aijbij > 0}.
Clearly, we can assume that all the entries of A are bounded in absolute value by at most 1.
A simple generalization of Forster’s bound from Theorem 4.2 was proved in [29]: for any matrix A ∈ Rm×n, sign-rank(A) ≥ minxy |Axy| · √mn/∥A∥. Another generalization in [30] considers preserving signs for all but h of the entries of A: if A ̃ is obtained from A ∈ {−1, +1}m×n by changing at most h entries in an arbitrary fashion, then sign-rank(A ̃) ≥ √mn/∥A∥ + 2√h. Razborov and Sherstov [88] prove the following hybrid generalization.
Theorem 4.19. Let A ∈ Rm×n be such that for all but h of (i,j), |aij| ≥ γ. Then,
γmn sign-rank(A) ≥ ∥A∥√mn + γh.
Proof. Let sign-rank(A) = d. Let B be a matrix of rank d such that for all i,j, where aij ̸= 0, aijbij > 0. We can further assume that
(1) ∥B∥∞ ≤ 1.
(2) ∥B∥2F = mn/d.
Indeed, write B = XY , where X ∈ Rm×d and Y ∈ Rd×n. Using argu-
ments similar to those in Section 4.2, we can assume that the rows
of X, x1,…,xm ∈ Rd, are unit vectors in general position and that
the columns of Y , y1,…,yn ∈ Rd are all unit vectors. Then, B is
defined by bij := ⟨xi,yj⟩. Since xi and yj are unit vectors, for any i,j,
|bij| = |⟨xi,yj⟩| ≤ ∥xi∥∥yj∥ = 1 by Cauchy–Schwartz. Thus, (1) holds.
The calculation in the proof of Theorem 4.2 shows that for any j,
􏰂m (⟨xi,yj⟩)2 = m/d. Thus, ∥B∥2 = 􏰂n 􏰂m (⟨xi,yj⟩)2 = mn/d. i=1 F j=1 i=1
This shows (2).

4.4 Sign-Rank of an AC0 function 81 􏰂 The proof compares an upper bound and a lower bound on ⟨A,B⟩ :=
ij aij bij (cf. Lemma 3.7). We have,
􏰕
aijbij since aijbij ≥ 0 for all i,j 
⟨A,B⟩ ≥
≥ γ􏰕|bij| − h since ∀i,j |bij| ≤ 1 and
|aij |≥γ
ij
aijbij = |bij| for all but h of (i,j) ≥ γ(∥B∥2F − h) by (1) above
􏰆mn 􏰇
≥γ d −h by(2)above.
On the other hand,
􏰕
⟨A,B⟩ =
≤ ∥A∥
i
≤ ∥A∥∥B∥F
σi(A)σi(B)
from the proof of Lemma 3.7
by Cauchy–Schwartz since
i
􏰕
σi(B)

d
􏰕22
σi(B)=∥B∥F and σi(B)=0 fori>rank(B)
√i
= ∥A∥ mn by (2) above.
Comparing the two bounds,
Hence,
􏰆mn􏰇 √
γ d −h ≤⟨A,B⟩≤∥A∥ mn.
γmn
d ≥ ∥A∥√mn + γh.
4.4.2 Smooth Orthogonalizing Distributions
We recall some basic facts about Fourier analysis of Boolean functions. Consider the space of functions F := {f : {0, 1}n → C} with the inner

82 Sign-Rank and Other Complexity Measures of Sign Matrices product:
−n 􏰕 (f,g) := 2
x∈{0,1}n 􏰂
f(x)g(x).
The functions χS (x) = (−1) i∈S xi , S ⊆ [n], form an orthonormal basis
for the space F with the above inner product. Hence, every function
f ∈ F can be uniquely expressed as a linear combination of χS . The
coefficients in this linear combination are called the Fourier Coefficients 􏰱
of f. In particular, for S ⊆ [n], the Fourier coefficient f(S) is given by
􏰱 −n􏰕 f(S) := 2
f(x)χS(x). Thus, we also have the inverse transform:
function f : {0, 1}n → {−1, +1} if
|S| < k ⇒ Ex∼μ[f(x)χS(x)] = 0. A smooth distribution places substantial weight on all but a tiny fraction of the sample points. The second ingredient of the main result constructs smooth orthog- onalizing distributions for the function: m 4m2 􏰗􏰘 i=1 j=1 Theorem 4.20. There is an (m/3)-orthogonalizing distribution μ for MPm such that μ(x) ≥ 8−m 2−4m3 /2 for all x ∈ {0, 1}4m3 such that MPm(x) = −1. 􏰱2 2 x∈{0,1}n |f(x)| ; this is known as Parseval’s Identity. x∈{0,1}n 􏰕􏰱 f(x) = By orthogonality of the basis, we also have S⊆[n] |f(S)| = (f,f) = S ⊆[n] f(S)χS(x). 􏰂 −n􏰂 2 Definition 4.8. A distribution μ on {0,1}n is k-orthogonalizing for a MPm(x) := xij. 4.4 Sign-Rank of an AC0 function 83 4.4.3 The Pattern Matrix Method for Lower Bounds on Sign-Rank The pattern matrix method, introduced by Sherstov [90], is a power- ful tool for proving lower bounds in communication complexity. This method is used in obtaining a formula (exact!) for the spectral norm of a pattern matrix defined by an AC0 function f in terms of Fourier coefficients of f. Let n and N be positive integers such that n|N. Partition [N] into n intervals of length N/n each: 􏰌 N 􏰍 􏰌 (n − 1)N 􏰍 [N] = 1,..., n ∪ ̇ ···∪ ̇ n + 1,...,N . Let V := V(N,n) denote the family of n-subsets V ⊆ [N] that contain exactly one element from each of the above intervals. For x ∈ {0, 1}N and V = {i1,...,in} ∈ V, where ij is in the jth interval in the above partition of [N ], let x􏱉V := (xi1 , . . . , xin ) ∈ {0, 1}n denote the projection of x onto V . Definition 4.9. Let φ : {0,1}n → R. The (N,n,φ)-pattern matrix A has rows indexed by x ∈ {0, 1}N and columns indexed by pairs (V, w) where V ∈ V(N,n) and w ∈ {0,1}n. Thus, A is a 2N × (N/n)n2n real matrix. The entry indexed by (x,(V,w)) is given by evaluating the function φ at x􏱉V ⊕ w, i.e., the mod-2 sum of the projection of x onto V and w. That is, A=(φ(x􏱉V ⊕w))x∈{0,1}N,(V,w)∈V×{0,1}n. The final ingredient we need is a formula for the spectral norm of the pattern matrix A in terms of the Fourier transform of φ, [90]. Theorem 4.21. Let A be the (N,n,φ)-matrix as defined above. Then, 􏰝 􏰈2N 􏰉n ∥A∥= 2N n 􏰌 max |φ􏰱(S)| S⊆[n] 􏰆 n 􏰇|S|/2􏰍 N . We now have all the ingredients to prove the main result of this section. 84 Sign-Rank and Other Complexity Measures of Sign Matrices Theorem 4.22. Let f be defined as in (4.4) and let Af = (f(x,y))x,y ∈ 2t3 ×2t3 {−1,+1} , be the matrix obtained by evaluating f on all pairs of inputs (x,y). Then, sign-rank(Af ) = 2Ω(t). Proof. Let M be the (N,n,MPm) pattern matrix. We first observe that M is a submatrix of Af , by letting t = cm for some large enough constant c. It suffices to take c = 4N/n (later we will set N = 106n). To see that M is a submatrix of Af, note that (z􏱉V )ij ⊕ wij = ((z􏱉V )ij ∧ wij ) ∨ ((z􏱉V )ij ∧ wij ). Now, consider two disjoint subsets of N bits each among the 2t bits of x and similarly for y. Map z ∈ {0, 1}N onto an x that agrees with z on one of these two disjoint subsets and with z on the other. Next, map (V,w) ∈ V × {0,1}n onto a y, i.e., zero everywhere except the positions indexed by elements of V ⊆ [N] in the two subsets. In the positions indexed by V set the bits in one of the subsets to those of w and in the other subset by w. For such an (x,y) it is easy to see that MPm(z􏱉V ⊕ w) = f(x,y). Thus, M is a submatrix of Af . Hence, it suffices to prove that sign-rank(M) = 2Ω(m). Let P be the (N,n,μ) pattern matrix, where μ is as in Theorem 4.20. Since P is a nonnegative matrix, sign-rank(M ) ≥ sign-rank(M ◦ P ). Instead of on M, we prove a lower bound of 2Ω(m) on sign-rank(M ◦ P) since Mμ := M ◦ P has a better-behaved Fourier transform than M, thanks to the smooth orthogonalizing aspect of μ for the function MPm. We will apply Theorem 4.19 to the matrix Mμ. Note that all but an exp(−m2)-fraction of entries of Mμ have absolute value at least 8−m2−n/2 by the smoothness property of μ and since |M(x,y)| = 1. Hence, we take γ := 8−m2−n/2 and h = |M|exp(−m2), where |Mμ| = |M| denotes the total number of entries in M, i.e., 2N · (2N/n)n. Hence, we have 􏰀􏰛􏱁 γ|M| γ |M| Ω(m2) sign-rank(Mμ) ≥ ∥Mμ∥􏰛|M| + γh ≥ min 2∥Mμ∥ ,2 . (4.5) It remains to prove an upper bound on ∥Mμ∥. Let φ(x):= MPm(x)μ(x) and note that Mμ is the (N,n,φ) pattern matrix. From 􏰱−n􏰂 −n definition, for any S ⊆ [n], |φ(S)| ≤ 2 x∈{0,1}n |φ(x)| = 2 . Also, 4.5 Applications of Complexity Measures of Sign Matrices 85 by the orthogonalizing nature of μ, φ􏰱(S) = 0 for |S| ≤ m/3. Hence, in the expression for ∥Mμ∥ in Theorem 4.21, the max is at most 2−n(n/N)m/6. It follows that ∥Mμ∥ ≤ 􏰛|M|2−n(N/n)−m/6. Substituting this upper bound on ∥Mμ∥ and the value of γ in (4.5), we obtain sign-rank(Mμ) ≥ 8−m(N/n)m/6. We take N = 106n, so we get sign-rank(Mμ) = exp(m) and the theorem is proved. 4.5 Applications of Complexity Measures of Sign Matrices 4.5.1 Communication Complexity The notion of sign-rank of a matrix was first introduced by Paturi and Simon [71] in the context of unbounded error probabilistic communica- tion complexity. We will focus on communication complexity models in detail in Section 5. Hence, we postpone this application to Sections 5.3 and 5.4. 4.5.2 Learning Theory In learning theory, a basic task is to design algorithms that take a training sample of input–output pairs ((ξ1,f(ξ1)),...,(ξm,f(ξm))) of a function f : D → {−1, +1}, ξi ∈ D, and learn some (approximate) rep- resentation f∗ of f. The function f is often assumed to come from a concept class F. Properties of F are exploited to make the learn- ing algorithm efficient and the classifier f∗ accurate. The domain D is often mapped, via a kernel, into Rd and the classifier f∗ is repre- sented by a hyperplane in Rd. The sample inputs ξi are then mapped to points xi ∈ Rd and an exact1 classifier f∗ would then correspond to a hyperplane with a normal y ∈ Rd such that xi is on the positive side of the hyperplane, i.e., sign⟨xi,y⟩ > 0 if and only of f(ξi) = +1. A future input x ∈ Rd is then classified by the learned hyperplane according to the sign of ⟨x,y⟩.
In large margin classifiers (e.g., Support Vector Machines (SVMs)), the objective of the classifier is to find a hyperplane that maximizes the margin, i.e., the distance to the hyperplane from any point from the
1 Here we consider only exact classifiers.

86 Sign-Rank and Other Complexity Measures of Sign Matrices
training sample. Since scaling can artificially inflate this distance, it is natural to assume that the samples xi and the normal y are all unit vectors in Rd. Furthermore, we can assume that the hyperplane passes through the origin by translating. Then, for a given sample X = {xi} and a hyperplane with normal y, it is easy to see that the margin is given by mini|⟨xi,y⟩|. Since the unknown function f comes from the concept class F = {f1,…,fn}, we want to consider the overall mini- mum margin (worst case among all functions from F) in an arrange- ment of hyperplanes (with normals yj) for all the functions from F and for the sample xi. By increasing the dimension by 1 and losing a factor of 2 in the margin, we can assume that hyperplanes all pass through the origin. The minimum margin of an arrangement (X, Y ) is then given by
margin(X,Y ) = min|⟨xi,yj⟩|. ij
Note that the dimension d of the space in which the arrangement is realized is not relevant for this complexity measure.
A concept class F on a given sample Ξ of inputs is naturally given by a sign matrix AΞ,F ∈ {−1,+1}m×n, where AΞ,F(ξ,f) = f(ξ), |Ξ| = m, and |F| = n. The performance of a learning algorithm with a given mapping κ : D → Rd for this concept class on this sample is margin(X,Y ) = minij |⟨xi,yj⟩|, where X = κ(Ξ) and Y is the set of nor- mals to the homogeneous hyperplanes learned by the algorithm. Hence, the performance of the best learning algorithm under the best mapping is given by
margin(AΞ,F) = supmargin(X,Y ) = supmin|⟨xi,yj⟩|, X,Y X,Y ij
where the sup is over all unit vectors x1,…,xm and y1,…,yn in Rd. This is the margin of the sign matrix A and defines, in some sense, the inherent learning complexity of A.
It is easy to show that there is a trivial arrangement of homoge- neous halfspaces that realizes any m × n matrix A with a margin of at least max{m−1/2,n−1/2}: if m ≤ n, let xi be the canonical unit vectors and yj be the normalized column vectors of the matrix A. Ben-David et al. [11] show that a random n × n sign matrix has a margin of at

4.5 Applications of Complexity Measures of Sign Matrices 87
most O(n−1/2 log1/2 n). Thus, for almost all n × n sign matrices the margin given by the trivial embedding is essentially optimal. Forster’s theorem (Theorem 4.7) gives an explicit matrix for which the trivial embedding indeed gives the best possible margin.
Corollary 4.23 (to Theorem 4.7). Suppose, AΞ,F is given by an n × n Hadamard matrix Hn. Then, margin(H) ≤ n−1/2.
There’s another parameter, the VC-dimension of a concept class that also plays a prominent role in learning theory. Vapnik [99], in particular, showed that if the VC-dimension of a concept class is d, then the margin of any arrangement realizing the concept class is at most d−1/2. The VC-dimension of the concept class given by an n × n Hadamard matrix is easily seen to be at most logn. Thus, the upper bound based on VC-dimension is at most (logn)−1/2, whereas the above corollary gives the optimal n−1/2.
4.5.3 Threshold Circuits
Our third application is to lower bounds on depth-2 threshold circuits in terms of minimal dimension.
Recall that a (linear) threshold gate with weights w1 , . . . , wn and threshold w0 outputs 1 on inputs x1,…,xn if and only if w1x1 + ··· + wnxn ≥ w0. Proving superpolynomial lower bounds on the size of depth-2 threshold circuits for explicit functions when the thresh- old gates are allowed to use arbitrary weights is an interesting open question. Hajnal et al. [35] show an exponential lower bound for the inner product mod-2 function when all the weights used in a depth-2 threshold circuit are polynomially bounded. Here, we will see a stronger result by showing that the restriction on the weights of the top gate can be removed. We use lower bounds on sign-ranks of explicit matrices to derive exponential lower bounds for some explicit functions includ- ing inner product mod-2. In fact, these lower bounds are exponen- tial when the depth-2 circuit has a threshold gate (with unrestricted weights) at the top and either threshold gates with polynomial weights or gates computing arbitrary symmetric functions at the bottom.

88 Sign-Rank and Other Complexity Measures of Sign Matrices
The following theorem states that (loosely speaking) Boolean func- tions with “high” minimal dimension cannot be computed by “small” (somewhat technically restricted) depth-2 threshold circuits. Part (a) of this theorem strengthens the lower bound of [35] and Part (b) gener- alizes and strengthens the results of [19, 46, 47]. Note that for technical reasons we assume that the top threshold gate is ±1-valued whereas the threshold gates on the bottom level are {0,1}-valued.
Theorem 4.24. Let (fn) be a family of Boolean functions, where fn : {0, 1}n × {0, 1}n → {−1, +1} and let Af (x, y) = f (x, y) be the cor- responding 2n × 2n sign matrix. Suppose (fn) is computed by depth-2 threshold circuits in which the top gate is a linear threshold gate (with unrestricted weights). Then the following holds:
(a) If the bottom level has s linear threshold gates using
integer weights of absolute value at most W, then s=
Ω􏰃 sign-rank(Af ) 􏰄. In particular, s = Ω􏰃 2n/2 􏰄 for fn = ipn nW nW
(inner product mod-2).
(b) If the bottom level has s gates computing symmetric func-
tions, then s = Ω􏰃 sign-rank(Af ) 􏰄. In particular, s = Ω􏰃 2n/2 􏰄 nn
for fn = ipn.
Note that the specific bounds for ipn follow immediately from the general bound for fn by applying Corollary 4.4.
Using Theorem 4.22, we obtain a separation result between depth-3 AC0 and depth-2 threshold circuits.
Corollary 4.25. Let fn be the depth-3 AC0-function given by (4.4) with n = t3. Then, depth-2 threshold circuits of the kind in (a) com- puting fn must have size 2Ω(n1/3)/W and depth-2 threshold circuits of the kind in (b) computing fn must have size 2Ω(n1/3).
The proof of this theorem is based on Theorem 4.2 and the two lemmas below.

4.5 Applications of Complexity Measures of Sign Matrices 89
Lemma 4.26. Let G : {0, 1}n × {0, 1}n → {0, 1} be a threshold func- tion where for x,y ∈ {0,1}n, G(x,y) = 1 if and only if 􏰂ni=1 αixi + 􏰂ni=1 βiyi ≥ μ for weights αi,βi,μ ∈ Z. Then, G (viewed as a matrix) has rank at most min{􏰂ni=1 |αi|,􏰂ni=1 |βi|} + 1.
Proof. W.l.o.g. let 􏰂ni=1 |αi| ≤ 􏰂ni=1 |βi|. Let αmin (αmax) be the min- imal (maximal) value taken by 􏰂ni=1 αixi as x ranges over all possible inputs in {0,1}n. As the weights are integers, this sum takes at most αmax − αmin + 1 distinct values. We partition the rows of G according to the weight contributed by x and, within each block of these rows, we partition the columns into two groups depending on whether or not the weight contributed by y together with that of any row of that block exceeds the threshold μ or not. Specifically, define the following sets of entries of G for all α such that αmin ≤ α ≤ αmax:
􏰀􏰕n 􏰕n 􏱁
βiyi <μ−α , and 􏱁 􏰂 Gα,1 is an all-1 matrix for any α. Furthermore, G = α(Gα,0 + Gα,1). Hence, by subadditivity of rank we see that the rank of G is at most the number of distinct values taken by α. The latter is bounded above byαmax −αmin +1≤􏰂ni=1|αi|+1. Note that the same proof goes through even if we generalize the definition of G by setting G(x,y) = 1 if and only if 􏰂ni=1 αixi + 􏰂ni=1 βiyi ∈ T for an arbitrary subset T of Z. Specifically, we have the following corollary of the proof. Corollary 4.27. Let G : {0, 1}n × {0, 1}n → {0, 1} be a symmetric function in the sense that its value depends only on 􏰂ni=1(xi + yi). Then, its matrix G as defined above has rank at most n + 1. Sα,0 := (x,y): 􏰀 αixi =αand i=1 􏰕n i=1 􏰕n βiyi ≥μ−α . Sα,1 := (x,y): Let Gα,0 and Gα,1 be (disjoint) submatrices of G defined by the entries αixi =αand Sα,0 and Sα,1, respectively. It is clear that Gα,0 is an all-0 matrix and i=1 i=1 90 Sign-Rank and Other Complexity Measures of Sign Matrices Lemma 4.28. Let f : {0, 1}n × {0, 1}n → {−1, +1} be a Boolean func- tion computed by a depth-2 threshold circuit C with the top gate using unrestricted weights and each of the bottom gates using weights of absolute value at most W. Then, there is a real matrix F such that sign(F(x,y)) = f(x,y) for all x,y ∈ {0,1}n and rank(F) = O(snW), where s is the number of bottom gates. Proof. Let the top gate of C have weights φ1,...,φs and threshold φ0. Hence, we can write f(x,y) = sign(􏰂si=1 φiGi(x,y) − φ0), where Gi are the functions computed by the bottom gates. Define the matrix F :=φ1G1 +···+φsGs −φ0J, where J is the all 1’s 2n×2n matrix. It is clear that f(x,y) = sign(F(x,y)) for all x,y ∈ {0,1}n. Moreover, rank(F) ≤ 1 + 􏰂si=1 rank(Gi ) ≤ 1 + s(1 + nW ) using Lemma 4.26. Using Corollary 4.27, one can similarly prove that if f is computed by a depth-2 circuit with a threshold gate at the top and s symmetric gates at the bottom, then there is a matrix F of rank O(sn) that sign- represents f. Proof. [of Theorem 4.24]. By Lemma 4.28, if a depth-2 threshold circuit computes fn(x,y), then there is a matrix Fn such that sign(Fn(x,y)) = sign(fn(x,y)) and rank(Fn) = O(snW). On the other hand, rank(Fn) ≥ sign-rank(Af). Comparing the upper and lower bounds on rank(F), we get snW = Ω(sign-rank(Af )). This proves Part (a) of the theorem. Part (b) is proved similarly by means of Corollary 4.27. 4.5.4 Probabilistic OBDD’s We briefly recall the technical definitions concerning OBDD’s. An ordered binary decision diagram (OBDD) over Boolean variables x1,...,xn is given by a directed acyclic graph G = (V,E). The graph G contains exactly one source (node without a proper predecessor) and two sinks (nodes without a proper successor). One of the sinks is labeled 4.5 Applications of Complexity Measures of Sign Matrices 91 −1 and the other-one is labeled +1. Each nonsink node is labeled by a Boolean variable such that, on each path from the source to a sink, each variable occurs exactly once and they always occur in the same order.2 Each nonsink has two outgoing edges. One of them is labeled 0, the other one is labeled 1. Each OBDD G represents a Boolean function fG which is evaluated on a ∈ {0, 1}n as follows. Vector a enters G at the source and is then routed through G until it reaches a sink. More specifically, at a node labeled xi, vector a is routed through the outgo- ing edge labeled ai. According to this rule, a will be routed either to the (−1)-sink or to the (+1)-sink. In the former case, we have fG(a) = −1; in the latter case fG(a) = 1. The size of an OBDD is the number of its nodes. A probabilistic ordered binary decision diagram (POBDD) G is defined analogously, except that the routing procedure becomes prob- abilistic. Loosely speaking, the deterministic transitions along edges labeled either 0 or 1 are replaced by random transitions. Formally, this can be achieved as follows. With each node v on layer l of G, we associate two probability distributions D0 and D1 that assign probabilities to all nodes on layer l + 1. If v is labeled xi and vec- tor a is currently located at v, then a is routed to node w on layer l + 1 with probability Dai (w). Each POBDD G represents a collec- tion (BG(x)) of Bernoulli random variables in the obvious sense. We say that G realizes Boolean function f : {0, 1}n → {−1, +1} if f (x) = 1 implies that Pr(BG(x) = 1) > 1/2, whereas f(x) = −1 implies that Pr(BG(x) = 1) < 1/2. The size of a POBDD is the number of its nodes. POBDD’s are quite powerful because they need only a slight advan- tage over random guessing. We present here exponential lower bounds on the size of POBDD’s for concrete functions. In the sequel, we present a function IPn that is provably hard to realize with POBDD’s. Our function IPn embodies ipn (the inner product modulo 2 function) as a subfunction. Note however that ipn itself is easy to compute by OBDD’s 2Since we say that each variable occurs exactly once (as opposed to at most once) on each path, we implicitly make the assumption that G is layered, where all nodes in the same layer are labeled by the same variable and edges go from one layer to the next one according to the fixed order of the variables. This assumption is made only for the sake of simple technical exposition. 92 Sign-Rank and Other Complexity Measures of Sign Matrices (even for deterministic ones). We will define IPn in such a way, that, no matter how the variable order in the OBDD is chosen, there will be a subfunction of the form ipn such that the chosen variable order is actually “weird.” We move on to the definition of IPn. We use 2n Boolean variables z1,...,z2n and, in addition, 2n auxiliary Boolean variables h1,...,h2n. Assume that the Hamming weight of h is n and that the 0-entries occur in positions i(1) < ··· < i(n), whereas the 1-entries occur in positions j(1) < ··· < j(n). Then, we define 􏰂n z z IPn(z,h) := (−1) k=1 i(k) j(k) . (4.6) If the Hamming weight of h differs from n we define IPn(z,h) := 1 by default. Theorem 4.29. Any POBDD G that computes IPn has size at least 2n/2. Proof. Assume that the variables z1,...,z2n appear in G in the order zσ(1),...,zσ(2n). Let i(1) < ··· < i(n) be the sorted sequence of the ele- ments of {σ(1),...,σ(n)}. Likewise, j(1) < ··· < j(n) denotes the sorted sequence of all the elements of {σ(n + 1),...,σ(2n)}. Note that in G all zi(k)-variables occur before all zj(k)-variables. Fix a vector h of Hamming weight n such that hi(k) = 0 and hj(k) = 1 for k = 1,...,n. This leads to a subfunction of the form (4.6). Let V ′ be the set of nodes labeled zj(1). In what follows, we write input z in the form (x,y), where x = (zi(1),...,zi(n)) and y = (zj(1),...,zj(n)). For each v ∈ V ′, denote by pv(x) the probability that z = (x,y) is routed from the source to v. Denote by qv(y), the probability that z is routed from v to the (+1)- sink. We conclude that 􏰕1 pv(x)qv(y) > 2. (4.7)
Note that, for our fixed choice of h, the subfunction IPn(x,y,h) is the inner product modulo 2 of x and y. Equation (4.7) realizes this subfunc- tion by a |V ′|-dimensional arrangement that uses half spaces induced
IPn(x,y,h) = 1 ⇔
v∈V′

4.5 Applications of Complexity Measures of Sign Matrices 93
by inhomogeneous hyperplanes. Adding one dimension, the hyperplanes can be made homogeneous (and we arrive at a linear arrangement). By Forster’s Theorem 4.2, |V | ≥ |V ′| + 1 ≥ 2n/2.
Our lower bound can be extended to k-POBDD’s: k-POBDD’s gen- eralize POBDD’s in the following way. On each path from the source to a sink the variables appear at most k times in an order, i.e., the same permutation of variables repeated k times, i.e., a k-POBDD can be partitioned into k layers such that each layer is read-once ordered according to the same variable ordering. From Example 5.11, we con- clude that every probabilistic two-way protocol for IPn has length at least n/2 − 1. Using this we obtain the following.
Corollary 4.30. Each k-POBDD, k ≥ 2, computing IPn has size at 4k−2
least 2 n−2 −1.
Indeed, if there exists a k-POBDD P of size S then one can find a probabilistic communication protocol of 2k rounds that computes the same function as P and has length (2k − 1)⌈log S ⌉. In this communi- cation game, the input variables of processor Π0 are the variables that are read in the first half of each layer of P and the input variables of processor Π1 are the variables read in the second half of each layer, respectively. The computation can be performed in the way that the processors follow a path in P in turn such that each processor sends to the other one the number of the first node on the path not belonging to its sublayer. This well-known technique (e.g., [15]) yields the desired lower bound.

5
Rank Robustness and Two-Party Communication Complexity
5.1 The Log-Rank Conjecture in Communication Complexity
For every graph G, is it true that
log χ(G) ≤ (log rank(G))O(1),
where χ(G) is the chromatic number of the complement of G and rank(G) is the rank of the adjacency matrix of G over R? This combi- natorial question is equivalent to a complexity question [62]:
Let f : {0,1}t × {0,1}t → {0,1} and let cc(f) denote the two-party communication complexity of f. Let Mf (x,y) = f(x,y). Then, is it true that for all f
cc(f) ≤ (logrank(Mf))O(1)? Here also the rank of Mf is taken over R.
It is well-known [48, Section 1.4] that cc(f ) ≥ log rank(Mf ) (in this inequality, rank over any field would do). Thus, answering the above
94

5.1 The Log-Rank Conjecture in Communication Complexity 95
question positively would resolve the famous log-rank conjecture in communication complexity. This is by far the most challenging open question in two-party communication complexity.
The most significant progress so far on this conjecture has been made by Nisan and Wigderson [68]. They show that (i) to resolve the log-rank conjecture, it suffices to show that every 0–1 matrix A of rank r contains a “large” submatrix of rank at most (1 − ε)r for some constant ε; here, large means having a fraction at least exp(−(logrankA)O(1)) of entries of A, and (ii) there is an f such that cc(f)≥(logrank(Mf))log36. We will discuss these results below.
In what follows, |M| denotes the total number of entries of M (including zero entries).
Theorem 5.1. If,
for every 0–1 matrix M of rank r and for some δ ≥ exp(− logc r) for some constant c > 0, there exists a submatrix S with at least δ|M| entries such that rank(S) ≤ 0.25r,
then M has a protocol tree with at most exp(O(logc+1 r)) leaves. In particular, M can be covered by at most exp(O(logc+1r)) disjoint monochromatic rectangles.
Remark 5.1.
• Note that instead of 0.25 in the assumption above, any con- stant < 1 would do since by repeatedly taking submatrices a constant number of times, we can still get a submatrix of the claimed size (with a different c) of rank ≤ 0.25r. • If |M| ≤ exp(logc+1 r), then the trivial protocol already gives a communication complexity of O(log |M |) = O(logc+1 r). Hence, we can assume w.l.o.g., that |M| ≥ exp(logc+1 r). This guarantees that a submatrix with δ|M| entries is not too small. 96 Rank Robustness and Two-Party Communication Complexity Proof. Let L(m,r) denote the maximum, over all 0–1 matrices M with m entries and rank r, of the minimum number of leaves in a two-party protocol tree for M. Clearly, L(m,1) ≤ 2 for all m. We wish to prove that L(m, r) ≤ exp(logc+1 r) for all m assuming the hypothesis of the theorem. Let M be indexed by X × Y , i.e., rows by x ∈ X and columns by y ∈ Y,andletSbeindexedbyX0×Y0forX0⊆XandY0⊆Y. Consider the partition of X × Y into the four parts Xα × Yβ for α,β ∈ {0,1}, where X1 = X \X0 and Y1 = Y \Y0. The corresponding submatrices of M are denoted Sαβ, where S = S00: Observe that 􏰧􏰨 M= S00 S01 . S10 S11 rankS01 + rankS10 ≤ rankM + rankS00. rank S01 ≤ rank S10. Then rank S01 ≤ (rank M + W.l.o.g., rank S00)/2 ≤ 5 rank M since rank S00 ≤ 1 rank M. It follows that 84 rank[S00|S01]≤ 1rankM + 5rankM = 7rankM. By definition, then, the top part [S00 |S01] has a protocol tree with most L(m,7r/8) leaves, while the bottom part [S10 |S11] has a protocol tree with at most L((1 − δ)m,r) leaves. The row player will send one bit to the column player to indicate whether his input x ∈ X is in the top part or the bottom part. (If rank S01 > rank S10, the players would consider the partition into the left and the right parts and then the column player would send a bit.) They will then proceed recursively on either part of the matrix.
Thus, we have the recurrence relation
􏰈
L(m,r) ≤ L
The above recurrence exp(O(logc+1 r)).
7r􏰉
m, 8 + L((1 − δ)m,r).
relation has the solution
(5.1) L(m, r) =
488

5.1 The Log-Rank Conjecture in Communication Complexity 97
Corollary 5.2. If, for every 0–1 matrix M of rank r and for some δ ≥ exp(− logc r) with some constant c > 0, there exists a submatrix S with at least δ|M| entries such that rankS ≤ 0.25r, then for any Boolean function f : {0, 1}t × {0, 1}t → {0, 1}, cc(f ) ≤ (logrank(Mf))c+1. In other words, the log-rank conjecture holds under the hypothesis stated in Theorem 5.1.
Proof. It is known that if the {0,1}-matrix Mf has a communication protocol with L leaves, then f has communication complexity at most O(logL) (see [48, Lemma 2.8].).
To show the best known gap between the rank and communication complexity of a matrix, we need to review the notions of sensitivity and degree of a Boolean function. We will construct a function with maximum sensitivity and relatively low degree as a polynomial over R. A communication matrix defined in terms of this function will have low rank (owning to the low degree) and high communication complexity (owing to the maximum sensitivity).
Definition 5.1. For a Boolean function g : {0,1}t → {0,1}, define the sensitivity of g, sens(g), as
sens(g) := max |{y ∈ {0, 1}t : g(y) ̸= g(x), dist(x, y) = 1}|, x∈{0,1}t
where dist(x,y) denotes the Hamming distance between x and y. In other words, define the sensitivity of g at input x to be the number of Hamming neighbors of x where g takes a value different from g(x) itself. Now, sensitivity of g is defined as the maximum sensitivity of g at any input.
For example, the OR, the AND, and the PARITY functions on n variables all have sensitivity n. The MAJORITY function, on the other hand, has sensitivity (n + 1)/2 for odd n.
Definition 5.2. For a Boolean function g : {0,1}t → {0,1}, the degree of g, deg(g) is defined as the degree of the unique polynomial over R

98 Rank Robustness and Two-Party Communication Complexity
that represents g. Note that the deg(g) is also the largest |S|, S ⊆ [t],
such that the Fourier coefficient gˆ(S) ̸= 0.
The exact relation between degree and sensitivity of Boolean func- tions is an intriguing open question in Boolean function complexity. What is currently known is summarized in the next two theorems by Nisan and Szegedy [66] and Kushilevitz (stated in [66]).
􏰛
Theorem 5.3. For every Boolean function g, deg(g) ≥ c sens(g), for some absolute constant c > 0.
Theorem 5.4. There exists a t-variable Boolean function g such that sens(g) = t, but deg(g) ≤ tlog6 3 ≈ t0.6131.
Proof. Let g0 be the following 6-variable function (defined as a real polynomial of degree 3):
􏰕6 􏰕 􏰕
g0(x1,…,x6) = xi − i=1
xixj + xixjxk, {i,j,k}∈F
i̸=j
where F is the following set system on {1,…,6}:
{{1,2,5},{1,2,6},{1,3,4},{1,3,5},{1,4,6},{2,3,4}, {2,3,6},{2,4,5},{3,5,6},{4,5,6}}.
It’s easy to verify that in F,
• Every point is contained in exactly five sets.
• Every pair of points is contained in exactly two sets.
Using these properties it is easy to verify that g0 is indeed a Boolean function on {0,1}6. Its degree is clearly 3. It is also clear that g(0) = 0 and that g(x) = 1 for any x with exactly one 1. Thus, g has sensitivity 6.
We now recursively define gk on t := 6k variables as follows: gk(x1,…,xt) = g0(gk−1(x1,…,xt/6),…,gk−1(x5t/6+1,…,xt)),
i.e, g0 is applied on 6 copies of gk−1 on 6k−1 variables each.

5.1 The Log-Rank Conjecture in Communication Complexity 99 It is easy to see by induction on k that sens(gk)=t and that
deg(gk−1) = 3k = tlog6 3.
Definition 5.3. For a Boolean function g : {0,1}t → {0,1}, define the
function g∧ : {0, 1}t × {0, 1}t → {0, 1} as follows: g∧(x,y) := g(x1 ∧ y1,…,xt ∧ yt).
Lemma 5.5. If g : {0, 1}t → {0, 1} has sensitivity t, then cc(g∧ ) = Ω(t).
Proof. W.l.o.g. (by shifting g if necessary), we can assume that g(0) = 0 and ∀i, g(ei) = 1, where ei has zeros in all coordinates except the i’th, where it has a 1. Thinking of x and y as characteristic vectors of sets X,Y ⊆[t], note that g∧(x,y)=0 if X ∩Y =∅ and that g∧(x,y)=1 if |X ∩ Y | = 1.
Now, a well-known lower bound [48, Section 4.6] on the set disjoint- ness problem shows that even the randomized communication complex- ity of the problem of deciding whether the intersection between two sets is the empty set or a singleton is linear. For a more recent proof of this result, see [37].
This immediately gives the required reduction.
Lemma 5.6. If g : {0,1}t → {0,1} has degree d over R, then the
2t × 2t 0–1 matrix M(g∧) corresponding to g∧ has rank at most
􏰂d 􏰃t􏰄 = O(td). i=0 d
Proof. Consider the polynomial representation of g: 􏰕􏰖
g(z1,…,zt) = αs zi, S ⊆[t],|S |≤d i∈S
and note that g∧ is similarly expressed as a polynomial: 􏰕􏰖􏰖
g∧(x1,…,xt,y1,…,yt) = αs S ⊆[t],|S |≤d
i∈S
xi yj. j ∈S

100 Rank Robustness and Two-Party Communication Complexity
Now, to construct the matrix M(g∧), it suffices to construct matri-
ces corresponding to each monomial and add them up (with the corre-
sponding coefficients αS) entry-wise. On the other hand, the 0–1 matrix 􏰹􏰹
corresponding to each monomial i∈S xi j∈S yj is easily seen to be a rank-1 matrix (it’s a rectangle of 1’s given by the rows and columns
i∈S xi and j∈S yj, respectively). It follows that the rank of M(g∧) is bounded by the number of monomials in the polynomial representing the function
g and the proof is complete.
Theorem 5.7. There is a Boolean function f such that cc(f) ≥ (logrankMf)c, where c=log36−o(1)≈1.631.
Proof. Let f be g∧ where g is as in Theorem 5.4. Then cc(f)= Ω(t) from Lemma 5.5 since g is fully sensitive. On the other hand, Lemma 5.6 combined with the degree bound from Theorem 5.4 shows that rank(Mf ) ≤ O(td), where d ≤ tlog6 3. Hence,
logrankMf = (dlogt) + O(1) = tlog6 3+o(1) ≤ cc(f)log6 3+o(1).
Note that the method we used to prove the above theorem cannot yield more than a quadratic gap between log-rank and communication complexity, in view of Theorem 5.3.
5.2 Discrepancy and Randomized
and Quantum Communication Complexity
Just as rank of a sign matrix is a natural lower bound on the determin- istic communication complexity of the corresponding Boolean function, discrepancy turns out to be a natural lower bound on both the ran- domized and quantum communication complexity. In this subsection, we present some of these results.
Recall the definition of discrepancy (cf. Definition 4.7) disc(M) of a sign matrix M. Let discU(M) denote the discrepancy of M under the uniform distribution, i.e., discU(M)=maxR||R+|−|R−||/|M|, where the maximum is taken over all combinatorial rectangles R contained
corresponding to the subcubes defined by the products
􏰹
􏰹

5.2 Discrepancy and Randomized and Quantum Communication Complexity 101
in M . Nisan and Wigderson [68] also prove1 that for a rank-r matrix M ,
discU (M ) ≥ εr−3/2 for some constant ε > 0. However, from the results
of [53], we get an even stronger lower bound on general discrepancy
itself, i.e., disc(M) instead of discU(M): disc(M) ≥ 1r−1/2. To prove 8
this lower bound and for use in later results, we use a generalization of the γ2-norm of a matrix, defined in Definition 4.4 and introduced by Linial et al. [53].
Definition 5.4. For a sign matrix M = (aij), i.e., M ∈ {−1,+1}m×n, and a real number α ≥ 1,
γ2α(M) := min{γ2(B) : 1 ≤ aijbij ≤ α ∀ i,j}.
Clearly, if α ≤ β, then γ2α(M) ≥ γ2β(M) for any sign matrix M. In particular, γ2∞(M) ≤ γ2(M). Now, use Corollary 4.3 and Theorem 4.18 to conclude
8disc(M)≥margin(M)=(γ2∞(M))−1 ≥(γ2(M))−1 ≥(rank(M))−1/2.
For a sign matrix M, let Rε(M) be the randomized communica- tion complexity of the corresponding Boolean function fM with success probability ≥ 1/2 + ε. Similarly, let Qε be the quantum communica- tion complexity (with prior entanglement) of fM . It is well-known that both these communication complexity measures are lower bounded by Ω(log((1 − 2ε)/disc(M))). Linial and Shraibman [54] prove more gen- eral lower bounds in terms of γ2α-norms.
Theorem 5.8. With the above notation, 􏰆􏰇
Rε(M) ≥ 2log γ1/2ε(M) + 2log2ε, and 2
􏰆􏰇
Qε(M) ≥ log γ1/2ε(M) + log2ε. 2
1Their result is more generally expressible in terms of the spectral norm: discU(M)≥ ε(∥M∥/n)3.

102 Rank Robustness and Two-Party Communication Complexity
Proof. By definition, a randomized communication protocol for fM is a probability distribution μ on deterministic protocols for fM such that, for each input, the probability mass on the protocols comput- ing the correct answer is at least 1/2 + ε. Let Dπ be the ±1-matrix corresponding to a deterministic protocol π (making some errors) for fM and let μ(π) be the probability with which π is chosen. Recall that Dπ can be partitioned into at most 2c monochromatic rectangles, where c := Rε(M) is an upper bound on the communication complexity of Dπ. Hence, rank(Dπ) ≤ 2c. By Lemma 4.16, γ2(Dπ) ≤ 􏰛rank(Dπ) ≤ 2c/2. Consider the matrix
1􏰕
μ(π)Dπ.
Clearly, |Nij| ≤ 1/2ε for all i,j. When Mij = +1 (−1), 􏰂 μ(π)(Dπ)ij ≥
N := 2ε
2ε (≤ −2ε, respectively). Hence, 1 ≤ MijNij ≤ 1/2ε. Finally, since γ2 is
a norm,
γ2(N) ≤ 2ε
2 2ε
π
1 􏰕 1 c/2
μ(π)γ2(Dπ) ≤ 2ε2 .
It follows that γ1/2ε(M) ≤ 1 2c/2. This proves the first part of the
π
π
theorem.
The second part of the theorem is based on the following lemma. A
proof of this lemma can be found in [54].
Lemma 5.9. For a sign-matrix M, let P := (pij) be the acceptance probabilities of a quantum communication protocol with prior entan- glement for fM and complexity c. Then, P = XY , where the matrices X and Y satisfy ∥X∥2→∞,∥Y ∥1→2 ≤ 2c/2.
If prior entanglement is not used, the matrices X and Y can be taken to have rank at most 22c.
Given this lemma, the proof of the second part of the theorem is
similar to the first part by taking N to be 1 (2P − J), where J is the 2ε
all 1’s matrix.

5.3 Relating Probabilistic Communication Complexity 103 5.3 Relating Probabilistic Communication Complexity
to Sign-Rank and Margin Complexity
We say that a probabilistic communication protocol computes the func- tion f : {0, 1}t × {0, 1}t → {−1, 1} with unbounded error if for all inputs (x, y) ∈ {0, 1}t × {0, 1}t the correct output is calculated with probabil- ity greater than 1/2. Since the slightest advantage over random guessing is already sufficient, this model is called the unbounded error model. Let n := 2t. The length of a communication protocol is ⌈log2 K⌉, where K is the number of distinct message sequences that can occur in the pro- tocol. The unbounded error probabilistic communication complexity [71] UPP-CC(f ) of a function f : {0, 1}t × {0, 1}t → {−1, 1} is the smallest length of a communication protocol that computes f with unbounded error (in the sense defined above).
We briefly note that Paturi and Simon [71] have shown that, in the unbounded error model, any probabilistic two-way protocol of length k can be converted into a one-way protocol with length at most k + 1.
Furthermore, the minimal dimension or the sign-rank of a realiza- tion (cf. Definitions 4.1 and 4.2) of the sign-matrix Af of a Boolean function f(x,y) is closely related to its probabilistic communication complexity UPP-CC(f). Paturi and Simon [71] have proved the follow- ing relation:
⌈log sign-rank(Af )⌉ ≤ UPP-CC(f ) ≤ ⌈log sign-rank(Af )⌉ + 1 (5.2) Using Theorem 4.2, we conclude the following.
Theorem 5.10. For any function f : {0, 1}t × {0, 1}t → {−1, 1}, UPP-CC(f ) ≥ n/∥Af ∥.
Example 5.11. Let ipt(x,y) := (−1)x⊤y, where x,y ∈ Zt2, be the inner product mod-2 function. It is well known that the corresponding matrix H such that Hx,y = ipn(x,y) is an Hadamard matrix and easy to see that H has operator norm 2t/2. Hence, from Theorem 5.10, UPP-CC(ipt) ≥ t/2.

104 Rank Robustness and Two-Party Communication Complexity
Given a probabilistic communication protocol, let ε(x,y) denote the difference between the probability that (x,y) is accepted and the proba- bility that (x,y) is rejected by the protocol. Thus, f(x,y) = sign(ε(x,y)) for any probabilistic protocol that correctly computes f. The error bound of the protocol is defined as minx∈X,y∈Y |ε(x,y)|. Hence, we only require that |ε(x,y)| > 0 for all (x,y) in the Paturi–Simon model. We say a function family f = (ft) is in the communication complexity class UPP if each ft has an unbounded error probabilistic protocol with 2polylog(t) messages.
A weakening of the above definition gives a communication complex- ity analog of the class PP in the Turning Machine world. Here, we define that a family f = (ft) of Boolean functions ft : {0,1}t × {0,1}t → {−1,+1} belongs to PPcc if there exists a probabilistic one-way pro- tocol that transmits at most polylog(t) bits (uses at most 2polylog(t) messages) and achieves error bound 2−polylog(t).
Halstenberg and Reischuk [36] have shown that the class PPcc does not change if we allow only one-way protocols. The class PPcc is one of the main complexity classes in the bounded-error model for proba- bilistic communication complexity.
Interestingly, it is possible to express membership in PPcc in terms of only one parameter: the maximal margin (cf. Sections 4.3 and 4.5.2). Here is the connection.
Theorem 5.12. (ft) ∈ PPcc ⇔ margin(Af ) ≥ 2−polylog(t).
Combining this theorem with Theorem 4.18, we also get a charac- terization of PPcc in terms of discrepancy. This characterization was originally discovered by Klauck [43] using Fourier transform techniques.
Corollary 5.13. (ft) ∈ PPcc ⇔ disc(Af ) ≥ 2−polylog(t).
Theorem 5.12 will follow from Lemmas 5.14, 5.15, and 5.16 below. Lemma 5.15 makes use of a random projection technique from [3]. Lemmas 5.14 and 5.16 are implicitly proven in [71] on probabilistic communication complexity with unbounded error.

5.3 Relating Probabilistic Communication Complexity 105
Lemma 5.14. Each probabilistic one-way protocol for f that uses
at most K messages and achieves error-bound ε can be converted
into a K-dimensional linear arrangement that realizes f with margin √
μ≥ε/ K.
Proof. Let pi(x) be the probability that Player 1 sends the i’th message on input x. Let qi(y) be the probability that Player 2 outputs a 1 upon receiving the i’th message on input y. It is clear that ε(x,y) = 􏰂Ki=1 pi(x)(2qi(y) − 1). Define
ux := (pi(x))Ki=1, vy = (2qi(y) − 1)Ki=1.
It is easy to see that {ux/∥ux∥ : x ∈ {0,1}t} and {vy/∥vy∥ : y ∈ {0,1}t}
define a realization of the matrix of f in K-dimensions. Furthermore,

∥ux∥ ≤ ∥ux∥1 = 1 since it is a vector of probabilities and ∥vy∥ ≤ since each of its entries is at most 1 in absolute value. It follows that the margin of this realization is

μ ≥ min|⟨ux/∥ux∥,vy/∥vy∥⟩| ≥ √ min|ε(x,y)| = √ .
x,y K x,y K
This proves one direction of Theorem 5.12. The other direction fol-
lows by combining the next two lemmas.
Lemma5.15. Each linear arrangement (of arbitrarily high dimension) that realizes f with margin μ can be converted into an O(t/μ2)-dimensional linear arrangement that realizes f with margin μ/2.
Proof. (sketch) The following result (whose proof can be found in [11]), is based on the technique of random projections using the Johnson– Lindenstrauss lemma (see [3]):
Let w, x ∈ Rr be arbitrary but fixed. Let R = (Ri,j ) be a random (k × r)-matrix such that the entries Ri,j are i.i.d. according to the normal distribution N(0,1).
K

106 Rank Robustness and Two-Party Communication Complexity Consider the random projection u := 1 (Ru) ∈ Rk for
R √k
all u ∈ Rr. Then, the following holds for every μ > 0:
􏰙 μ􏰃 2 2􏰄􏰚 −μ2k/32
This result can be used in an obvious way to guarantee the existence of a random projection that maps an r-dimensional linear arrangement that realizes f with margin μ to a k-dimensional linear arrangement that realizes f with margin μ/2 by choosing k := ct/μ2 for a suitable positive constant c.
Pr |⟨wR,xR⟩−⟨w,x⟩|≥ R4
∥w∥ +∥x∥ ≤4e
.
Lemma 5.16. Each K-dimensional linear arrangement that realizes f with margin μ can be converted into a probabilistic one-way protocol
√ that uses at most 2K messages and achieves error-bound ε ≥ μ/ K.
Proof. (sketch) Let ux and vy be unit vectors in RK for x, y ∈ {0, 1}t that realize the matrix Af with margin μ. Player 1 sends a message (i,sign(ux(i)), where ux(i) denotes the i’th coordinate of ux, on input x with probability:
pi(x) := |ux(i)|. ∥ux ∥1
Clearly, the number of messages is 2K. Now, Player 2 accepts on input y and receiving the i’th message from Player 1 with probability:
qi(y) := sign(ux(i)) · vy(i) + 1. 2
By a calculation similar to that in the proof of Lemma 5.14, we can see that
⟨ux,vy⟩ μ ε(x,y)=∥u∥ ≥√ .
x1K
Here, the inequality follows by the assumed bound on the margin given
√√ by ux and vy and by the inequality ∥ux∥1 ≤ K∥ux∥2 = K.

5.4 Matrix Rigidity and Quantifier Alternation Communication Complexity 107 5.4 Matrix Rigidity and Quantifier Alternation
in Communication Complexity
Taking a complexity theoretic view of Yao’s [104] model of two-party
communication complexity, Babai et al. [5] defined analogs of var-
ious Turing Machine complexity classes. For example, they defined
PPcc ,PHcc , and PSPACEcc. A class unique to communication com-
plexity, namely, UPP, arising from the Paturi–Simon model was
discussed in Section 5.3. Separating the complexity classes in the two-
party communication model does not appear to be nearly as chal-
lenging as separating the corresponding classes in the Turing Machine
model. Specifically, we know that Pcc, NPcc, and BPPcc are all differ-
ent from each other and that NPcc ∩ Co–NPcc = Pcc [48, Section 4.5].
We will also see later that UPP and Σcc are separated in a recent 2
result by Razborov and Sherstov [88], resolving a long-standing open question in this area, and improving the result that PSPACEcc ̸⊆ UPP [28]. We note that both these separation results follow from lower bounds on sign-rank we presented in Section 4. On the other hand, it is still a major challenge to separate PHcc and PSPACEcc. We now relate this question to the notion RA(r,θ) of matrix rigidity (cf. Definition 3.2).
First, we review some of the definitions.
To define communication complexity classes, we consider languages consisting of pairs of strings (x,y) such that |x| = |y|. Denote by Σ2∗ the universe {(x,y) : x,y ∈ {0,1}∗ and |x| = |y|}. For a language L ⊆ Σ2∗, we denote its characteristic function on pairs of strings of length m by Ln, where n := 2m. Ln is naturally represented as an n × n matrix with 0–1 or ±1 entries (with −1 for true and +1 for false).
Conversely, if A={An} is an infinite sequence of ±1-matrices (where An is n × n), then we can associate a language LA with A and talk about its communication complexity. LA is not necessarily unique (since the n’s may be different from powers of two), but for the purposes of lower bounds we will fix one such language and refer to it as the language LA corresponding to A.
We recall the following definitions from [5].

108 Rank Robustness and Two-Party Communication Complexity
Definition 5.5. Let nonnegative integers l1(m),…,lk(m) be such that l(m) := 􏰂ki=1 li(m) ≤ (logm)c for a fixed constant c ≥ 0.
A language L is in Σcc if there exist l1(m),…,lk(m) as above and k
Boolean functions φ, ψ : {0, 1}m+l(m) → {0, 1} such that (x, y) ∈ Ln if and only if
∃u1 ∀u2 ···Qkuk (φ(x,u)♦ψ(y,u)),
where |ui|=li(m),u=u1···uk, Qk is ∀ for k even and is ∃ for k odd,
and, ♦ stands for ∨ if k is even and for ∧ if k is odd. Definition 5.6.
• By allowing a bounded number of alternating quantifiers in
Definition 5.5, we get an analog of the polynomial time hier-
archy: PHcc = 􏰒 Σcc. k≥0 k
• We get an analog of PSPACE, by allowing an unbounded, but no more than polylog(m), number of alternating quantifiers
cc􏰒􏰒 cc in Definition 5.5: PSPACE = c>0 k≤(logm)c Σk .
Theorem 5.17. Let L be a language in PHcc and An be its n × n ±1-matrix, where n := 2m. Then, for all constants c ≥ 0, there exist constants c1,c2 ≥ 0 such that
(log log n)c1 (log log n)c2 n2 RAn(2 ,2 ) ≤ 2(loglogn)c .
Corollary 5.18. Let A = {An} be an infinite sequence of ±1-matrices and LA be the associated language. For some constant c > 0 and any r and θ in 2(loglogn)ω(1), if RA(r,θ) ≥ n2/2(loglogn)c, then LA ̸∈ PHcc.
In particular, if A can be chosen such that LA ∈ PSPACEcc, then PHcc 􏱈 PSPACEcc.

5.4 Matrix Rigidity and Quantifier Alternation Communication Complexity 109
Proof. (of Theorem 5.17) This theorem is proved using Tarui’s [97] low-degree polynomials (over integers) that approximate AC0-circuits. The theorem will follow from the following: for all c > 0, there exist
c1,c2 > 0, and integer matrices {Bn}, where Bn is n × n, such that
(1) rank(Bn) ≤ 2(loglogn)c1 .
(2) ∀(x,y), 1 ≤ |Bn(x,y)| ≤ 2(loglogn)c2 . (3) wt(An − Bn) ≤ n2/2(log log n)c .
For simplicity of notation, let L ∈ Σcc, where k is odd. In Defini- k
tion 5.5 of Σcc, for any fixed sequence of moves u = u1,…,uk, φ is a k
function of x and ψ is a function of y. Define fu(·) ≡ φ(·,u) and simi-
larly gu(·) ≡ ψ(·,u). Replacing ∃ move by an OR-gate and ∀ move by
an AND-gate, we see that L has a Σcc protocol iff it can be expressed as k
the output of an {AND,OR} circuit C of depth k and size 2polylog(m), where the inputs of C are fu(x) ∧ gu(y) for 1 ≤ u ≤ 2polylog(m). Hence, for all (x,y) ∈ {0,1}m × {0,1}m,
L(x,y) = C(f1(x) ∧ g1(y),…,ft(y) ∧ gt(y)), (5.3)
where t ≤ 2polylog(m) is the number of possible u’s.
Considering fi as the characteristic function of a subset Ui of rows
and gi as that of a subset Vi of columns of the {0, 1}m × {0, 1}m matrix, we observe that fi(x) ∧ gi(y) is a “rectangle” Ui × Vi in the matrix. We will denote this rectangle Ri and identify it with the corresponding n × n {0,1}-matrix of rank-1:
􏰌1, if fi(x) ∧ gi(y) = 1, Ri(x,y) = 0, otherwise.
From Equation (5.3), it follows that L is in Σcc iff its matrix is express- k
ible by an AC0-circuit (of quasi-polynomial size) acting on a set of rank-1 matrices.
We now use the fact that an AC0-circuit is well-approximated by a low-degree polynomial over Z. Tarui [97] constructs such polynomials.
Theorem 5.19 (Tarui). Let C be an AC0-circuit of size 2polylog(t) and φ1,…,φt : {0,1}s → {0,1} be arbitrary Boolean functions. Fix

110 Rank Robustness and Two-Party Communication Complexity −(log t)c′ ′
0 < δ = 2 , for some constant c ≥ 0. Then, there exist constants c′1,c′2 ≥ 0, and a polynomial Φ(φ1,...,φt) such that • Low degree: The degree of Φ in φ1,...,φt is at most (logt)c′1. • Small error: The fraction of inputs x ∈ {0,1}s, where C(φ1,...,φt)(x) ̸= Φ(φ1,...,φt)(x) is at most δ. • Small norm: The sum of the absolute values of the coefficients of Φ is at most 2(log t)c′2 . • Boolean guarantee: When Φ differs from C, the value of Φ(φ1,...,φt)(x) is ≥ 2. Let Ln be the {0,1}-matrix of L , i.e., Ln = (Jn − An)/2, where Jn is the n × n all 1’s matrix. From Equation (5.3), Ln is computed by an AC0-circuit C(z1,...,zt) of size 2polylog(m), where zi = fi(x) ∧ gi(y) = fi(x)gi(y) since fi,gi are {0,1}-functions. Using Theorem 5.19 for C, there is polynomial Φ of degree d ≤ polylog(t) such that L(x,y) = Φ(x,y), except for a δ = 2−(log m)c (choose c′ such that (log m)c = (log t)c′ ) fraction of (x, y) ∈ {0,1}m × {0,1}m. We can write Φ as follows 􏰕􏰖 􏰹 S ⊆[t],|S |≤d 􏰕 S ⊆[t],|S |≤d Φ(x,y) = = = αS 􏰕􏰖 S ⊆[t],|S |≤d i∈S Here, fS(x) = matrix RS of rank-1, and then, as a matrix, Φ is of rank at most i∈S fi(x) and similarly gS. Returning to our matrix interpretation, fS(x)gS(y) is a {0,1}- 􏰂 􏰃t􏰄 ≤ 2polylog(t). L and Φ agree on all but an ε fraction of the i≤d i entries. Furthermore, by Theorem 5.19, the entries of Φ are all non- negative integers, and, > 1 if L(x,y) ̸= Φ(x,y). Let us now define a
matrix Bn:
􏰕
S ⊆[t],|S |≤d
Bn :=Jn −2Φ=Jn −2 ·
αSRS.
zi
αS fi(x)gi(y)
i∈S
αS fS(x)gS(y).

5.4 Matrix Rigidity and Quantifier Alternation Communication Complexity 111 Clearly,
rank(Bn) ≤ 1 + rank(Φ) ≤ 2polylog(t) ≤ 2polylog(m),
thus proving (1); the value of constant c′1 implies constant c1 since polylog(t) = polylog(m) with suitable exponents. Entries of Bn are bounded in absolute value by 2polylog(m) and hence (2) is true; again c′2 implies c2 . Moreover, Bn differs from An in at most a δ = 2−(log m)c – fraction of entries. Thus (3) follows. (In fact, since Φ is at least 2 on the error points, Bn can only switch the signs of +1’s in An.)
Recently, Linial and Shraibman [55] relate the question of finding an explicit language outside PHcc to the question of soft margin com- plexity or mc-rigidity. This notion measures the Hamming distance of a sign matrix to a matrix of small margin complexity.
Definition 5.7. The mc-rigidity function of a sign-matrix A at Hamming distance d is defined as follows
mc-rigidity(A,d) := min{mc(B) : wt(A − B) ≤ d},
where margin complexity mc(B) of a sign-matrix B is as defined at the
beginning of Section 4.3.
Linial and Shraibman then show that an infinite family of sign matrices with sufficiently high margin complexity would yield a lan- guage outside PHcc. Since the quantitative bounds and the proof tech- niques (mainly Theorem 5.19) are similar to those of Theorem 5.17, we refer the reader to [55] for details.
We conclude this section with a remarkable separation result in two-party communication complexity recently proved by Razborov and Sherstov [88].
Theorem 5.20. Σcc ̸⊆ UPP and Πcc ̸⊆ UPP. 22
Proof. By definition, the function f on t3 bits given by (4.4) is com- puted by a depth-2 linear size circuit with rectangles at the inputs.

112 Rank Robustness and Two-Party Communication Complexity
Hence, f ∈ Πcc. On the other hand, by Theorem 4.22, sign-rank(A ) =
2f exp(Ω(t)). Characterization (5.2) of UPP communication complexity
implies that f requires at least Ω(t) bits of communication in the Paturi–Simon model. Hence, f ̸∈ UPP. Since UPP is closed under com- plement, the claim follows.

6
Graph Complexity and Projective and Affine Dimensions of Graphs
Just as we associate to a Boolean function f : {0, 1}l × {0, 1}l → {0, 1} a 2l × 2l 0–1 matrix Mf in the previous sections, we can also natu- rally associate to f a bipartite graph Gf and try to relate properties of Gf to the complexity of f. One can also think of models of compu- tation on graphs and relate complexity of graphs in those models to the complexity of the corresponding Boolean functions. This approach was initiated by Pudla ́k et al. [79]. These models generalize well-known models of Boolean complexity such as circuits, branching programs, and two-party communication complexity. Measures of graph complex- ity such as affine dimension and projective dimension of graphs have been proposed and studied in [76, 78, 85] as criteria for lower bounds on formula size and branching program size of Boolean functions. Sepa- ration questions about classes of two-party communication complexity [5, 104] can be re-formulated as lower bound questions about bipar- tite graph complexity as noted in [79]. In this section, we introduce graph complexity and motivate lower bounds on affine and projective dimensions of graphs by connecting them to lower bounds on Boolean functions. We present some reductions that connect graph complexity
113

114 Graph Complexity and Projective and Affine Dimensions of Graphs
to linear algebraic parameters and prove lower bounds on depth-3 graph formula size. Some of the techniques in this section are used in the next section on span programs.
6.1 Graph Complexity
The complexity of a graph G measures the difficulty of constructing G using a given collection of primitive graphs, called generators, and a given basis of operations on sets of edges. All the graphs involved are assumed to have the same set of vertices, typically X = {1,…,n}. A set operation on graphs refers to the operation on the corresponding edge sets. For instance, the result of G1 ∪ G2 on graphs G1 = (V,E1) and G2 = (V,E2) is the graph G = (V,E1 ∪ E2). Models of graph com- plexity are defined analogous to the standard models of circuits and formulas where the generator graphs play the role of input variables and the set operations play the role of gates.
As usual, we can imagine a circuit to be a directed acyclic graph (DAG) with the input nodes (of in-degree 0) labeled by the generator graphs and the internal nodes (gates) labeled by set operations from the basis. The target graph appears at the output gate (of out-degree 0) of this DAG. The depth of the circuit is defined to be the length of a longest path in the DAG. Its size is the number of nodes in it.
A graph formula is a graph circuit in which the out-degree of each gate is at most one. Thus, a graph formula can be represented as a tree with the leaves labeled by generator graphs and the internal nodes labeled by operations from the basis. The size of a formula is the number of leaves in its tree. The formula complexity of a graph is the smallest size of a formula that computes the graph (with respect to a fixed set of generators and a basis).
We can also define natural restricted models such as constant depth graph circuits and formulas. In these models, we allow unbounded fan- in and assume that the operations from the basis are naturally extend- able to an unbounded number of operands (e.g., union, intersection, and symmetric difference of sets have this property). We can similarly generalize other models of Boolean complexity such as decision trees and branching programs to graph complexity.

6.1 Graph Complexity 115
We will consider graph complexity with the set operations of ∪ (union) and ∩ (intersection) only. We will naturally want the sets of generators to be complete in the sense that every graph should be constructible from these generators using ∩ and ∪ operators in a circuit or a formula.
We are especially interested in the complexity of bipartite graphs because of their direct relevance to lower bounds on Boolean circuits and communication complexity.
Definition 6.1. Fix the color classes X and Y . Let B denote the fol- lowing set of complete bipartite graphs:
B={A×Y :A⊆X} ∪ {X×B:B⊆Y}.
For a bipartite graph G ⊆ X × Y , the bipartite-formula complexity of G is the smallest size of a graph formula computing G using B as the set of generators and ∪ and ∩ as the basis. Bipartite-formula complexity of G is denoted by LB(G).
Definition 6.2. Let f be a Boolean function on 2l variables, written as f : {0,1}l × {0,1}l → {0,1}. Let n := 2l. The n × n bipartite graph Gf ⊆ {0, 1}l × {0, 1}l is defined by including the edge (x, y) in Gf iff f(x,y) = 1, where x,y ∈ {0,1}l.
Note that the and and or operations on Boolean functions cor- respond to union and intersection operations on the edge sets of their corresponding graphs. In other words, Gf1∧f2 =Gf1 ∩Gf2 and Gf1∨f2 =Gf1 ∪Gf2. This suggests a syntactic transformation of a Boolean formula (assuming all negations are pushed to the leaves) into a graph formula. What about the input literals of the Boolean for- mula? The literals are simply the projection functions and the graphs corresponding to projection functions are complete bipartite graphs isomorphic to Kn/2,n and Kn,n/2. For instance, Gxi is the complete bipartite graph {x ∈ {0,1}l : xi = 1} × {0,1}l. Thus each literal can be translated into a generator in B. With this transformation of a Boolean

116 Graph Complexity and Projective and Affine Dimensions of Graphs
formula for f into a bipartite-formula for Gf , it follows that
LB (Gf ) ≤ L(f ), (6.1)
where L(f) is the minimum size of a formula (with tight negations) computing f.
Given an n × n bipartite graph G, where n = 2l, we can clearly define a function f on 2l variables such that Gf = G. Thus, we get the following criterion for lower bounds on Boolean formula size, proved in [79].
Lemma 6.1. A lower bound of LB(G) ≥ ψ(logn) for an explicit n × n bipartite graph G, where n = 2l, would yield an explicit function f on 2l variables with formula size lower bound L(f) ≥ ψ(l).
Since the proof of this proposition is essentially syntactic, similar relations hold for other models such as circuits, decision trees, and branching programs as well.
Note, however, that graph complexity of Gf could be much smaller than the Boolean complexity of f. This is because in a bipartite-formula we have access to an exponential (in n) number of generators B, whereas the transformation above uses only the 4logn “canonical” generators corresponding to the projection functions. In fact, the generators in B, with X = Y = {0,1}l, can be viewed as defining arbitrary Boolean func- tions of either the first l or the last l variables. This interpretation captures the connection between two-party communication complexity and graph complexity.
The connections below indicate the generality and usefulness of graph complexity. In the following remarks, we assume the generators are from B:
• When the model of computation is a decision tree, we get the model of Yao’s two-party communication complexity [104].
• When the model is a Boolean formula of constant depth (polylog(l) depth) and quasi-poly(l) size, we get the defi- nitions (cf. Definitions 5.6) of “polynomial hierarchy” PHcc (PSPACEcc, respectively) in the two-party communication complexity model as defined in [5].

6.2
6.2 Projective and Affine Dimensions of Graphs 117
• In [76], graph complexity in the model of branching programs was used to derive criteria for the branching program size of the corresponding Boolean function. In particular, they define the notion of projective dimension of graphs and show that lower bounds on projective dimension of graphs imply lower bounds on branching program size of Boolean func- tions. We will see this notion in more detail in the next sub- section.
• Formula complexity of graphs was used by Razborov [85] to derive criteria for lower bounds on formula size of the cor- responding Boolean function. He defines the notion of affine dimension of graphs and shows that lower bounds on affine dimension of graphs imply lower bounds on formula size of Boolean functions. He also shows the relations between affine and projective dimensions of graphs in various cases. We will review these results as well in the next subsection.
• Using the translation of graph complexity to Boolean func- tion complexity as an intermediate step, Lokam [58] shows that sufficiently strong lower bounds on the monotone com- plexity of the very special class of 2-slice functions imply lower bounds on the complexity over a complete basis of cer- tain Boolean functions.
Projective and Affine Dimensions of Graphs
Although the definitions of projective and affine dimensions make sense for any graph, we restrict our attention here to bipartite graphs only because of the motivating applications to Boolean function complexity. It is easy to generalize the following for nonbipartite graphs.
Definition 6.3. Let G ⊆ X × Y be a bipartite graph and V be a vec- tor space of dimension d over a field F.
• A projective representation of G in V associates a subspace Uz witheveryvertexz∈X∪Y suchthat(x,y)∈Gifand only if Ux ∩ Uy ̸= {0}. The projective dimension of G over F is

118
Graph Complexity and Projective and Affine Dimensions of Graphs
defined to be the minimum d such that G has a d-dimensional
representation and is denoted pdimF(G).
• The affine dimension of G adimF(G) is defined similarly
using affine subspaces Ux, where (x,y) ∈ G if and only if Ux ∩Uy ̸=∅.
Remark 6.1. Note that in the above definitions, we do not care about intersections of subspaces among vertices within the same color class X or Y .
R ́onyai et al. [89] prove the following upper bounds on the projective dimension of any (even nonbipartite) graph.
Theorem 6.2. For any graph G = (X,E),
• pdimF(G) ≤ |E| over any field F.
• pdimF(G) ≤ 2∆ if |F| ≥ |E|, where ∆ is the maximum degree
of a vertex in G.
Proof. Let m := |E|. For the first part, assign to each edge e ∈ E a
vectorve inabasis{ve : e∈E}ofFm.Toeachvertexx∈Eassignthe
subspace Ux := span{ve : e contains/incident with x}. For the second
part, pick m vectors {ve : e ∈ E} in general position in the space F2∆,
e.g., v = (1,α ,α2,…,α2∆−1), where α are distinct elements of F. Let eeeee
Ux be as above.
The main motivation for studying the affine dimension of graphs
comes from the following theorem [76].
Theorem 6.3. Let f : {0, 1}l × {0, 1}l → {0, 1} be a Boolean function and Gf be the associated n × n bipartite graph, where n = 2l as in Definition 6.2. Suppose f is computed by a branching program of size β. Then, for any field F, pdimF(Gf ) ≤ β + 2.

6.2 Projective and Affine Dimensions of Graphs 119
The following theorem about pdimF(G) for almost all bipartite graphs implies that most Boolean functions require branching programs of size Ω(2l/2). This leads to the main challenge about pdimF: prove lower bounds, even of the form ω(logn), for explicit bipartite graphs.
Theorem 6.4. For almost all n × n bipartite graphs G 􏰀 Ω(√n), if F is finite,
pdimF(G) = Ω􏰃􏰛n/logn􏰄, if F is infinite.
For finite F, the proof follows from a simple counting argument. For infinite (or sufficiently large) F, the proof [89] (generalizing the proof in [76] for F = R) uses upper bounds on the number of zero patterns of a sequence of polynomials.
The following lemma is proved in [76] (and generalized in [89]) to prove the second part of the above theorem and is often useful.
Lemma 6.5. Suppose, pdimF(G) ≤ d and that F is sufficiently large. Then, there is a projective representation of G in F2d in which, for every vertex x ∈ X, the associated subspace Ux is of dimension exactly d.
The only explicit graph we know of for which a lower bound on the projective dimension is known is the following.
Lemma 6.6. Let G ⊆ X × Y be the complement of a matching, e.g., X=Y ={1,2,…,n} and (x,y)∈G if and only if x̸=y. Then, pdimF(G) = Ω(log n) for any sufficiently large field F.
Proof. Suppose pdimF(G) = d. By Lemma 6.5, we can assume that G has a projective representation in V := F2d such that for each x ∈ X (y ∈ Y ) dimUx = d (dimUy = d). We will consider the dth exterior power W := 􏰓dV of V . Corresponding to a d-dimensional subspace U ⊆ V,wehaveavectoru∈W definedbythewedgeproductu=u1 ∧···∧ ud, where u1,…,ud is some basis of U. Thus, we have vectors ux and uy for each x∈X, y∈Y. Recall that a wedge product w=w1 ∧···∧

120 Graph Complexity and Projective and Affine Dimensions of Graphs
wk = 0 (the zero vector in 􏰓kV ) if and only if the vectors w1,…,wk are linearly dependent in V . It follows that ux ∧ uy = 0 in 􏰓2dV if and only if (x,y) ∈ G. By the proof of Theorem 6.12 [4, pp. 132–133], we conclude that {ux : x ∈ X} is a linearly independent set of vectors in W and hence n ≤ dimW = 􏰃2d􏰄. Hence, d = Ω(logn).
d
Next, we consider the motivation for the affine dimension of graphs. Razborov [85] shows that for any bipartite graph G, LB(G) ≥ adimF(G) (Theorem 6.12 below).
Before, we prove this theorem, we review some notions about rect- angle covers. This machinery is again used in Section 7.
6.2.1 Rectangle Covers
Let P and Q be (for now) arbitrary finite sets. A rectangle in P × Q is asetS×T,whereS⊆P andT⊆Q.AsetofrectanglesCiscalleda cover (or a rectangle cover) of P × Q if ∪R∈CR = P × Q. A rectangle cover is a disjoint cover if the rectangles in it are mutually disjoint. A set of rectangles C′ is embedded in the set of rectangles C if every R′ ∈C′ is contained in some R∈C. For a rectangle cover C of P ×Q, define
α(C) := min{|C′| : C′ is a disjoint cover embedded in C}.
Finding lower bounds on α(C) is reduced next to finding lower bounds on certain rank-like functions. Let A be a matrix over P × Q (i.e., rows are indexed by elements of P and columns by elements of Q). For a rectangle R ⊆ P × Q, let AR denote the matrix with entries in R equal to those of A and zero outside of R.
Lemma 6.7. For a rectangle cover C of P × Q and any matrix A (over any field F) over P × Q,
α(C) ≥ rank(A) . maxR∈C rank(AR)
In particular, if A is such that AR is monochromatic (i.e., all nonzero entries of AR are equal) for each R ∈ C, then α(C) ≥ rank(A).

6.2 Projective and Affine Dimensions of Graphs 121 Proof. Let C′ be a disjoint covering embedded in C such that
α(C)=|C′|. Note that for any R′ ∈C′, there is an R∈C such that ′􏰂
R ⊆ R. Thus, rank(AR′ ) ≤ rank(AR). Now, A = R′∈C′ AR′ and hence 􏰕
rank(A) ≤
rank(AR′ )
≤ |C′|maxrank(AR′)≤α(C)maxrank(AR).
R′∈C′
R′∈C′ R∈C
Here, another rank-like function is useful in proving lower bounds on α(C). In a partial matrix A over a field F, some entries may be left unspecified (denoted by ∗). The rank of a partial matrix is defined to be the minimum rank of a full matrix obtained from A, over all possible ways of filling the unspecified entries from F. Suppose, C = C1 ∪ C2. Let a1 ̸= a2 be two nonzero entries over F. An element (p,q) ∈ P × Q is covered by Ci if it is contained in some rectangle R′ ∈ Ci. Define the partial matrix A by
a1, Apq = a2,
if (p,q) is not covered by C1, if (p,q) is not covered by C2, otherwise.
∗,
Lemma 6.8. For any cover C of P × Q such that C = C1 ∪ C2 and any
partial matrix A defined as above, α(C) ≥ rank(A).
Proof. We will define a matrix B such that rank(B) = rank(A) for the partial matrix A. Let C′ be a disjoint covering embedded in C such that α(C) = |C′|. Partition C′ as C′ = C1′ ∪ C2′ (C1′ ∩ C2′ = ∅) where1 supp(C1′ ) ⊆ supp(C1) and supp(C2′ ) ⊆ supp(C2). Let
􏰕􏰕
B:=a2 JR′ +a1 JR′, R′ ∈C1′ R′ ∈C2′
where JR′ is a matrix with all 1’s inside R′ and 0’s elsewhere. Clearly, B extends A and hence rank(A) ≤ rank(B). Moreover rank(B) ≤ |C1′ | + |C2′ | ≤ |C′| = α(C).
1 Define supp(C) to be the union of all the rectangles in C.

122 Graph Complexity and Projective and Affine Dimensions of Graphs
Let f : {0,1}t → {0,1} be a Boolean function and let, now, P ⊆ f−1(1) and Q ⊆ f−1(0). Then, there is a canonical cover of P × Q consisting of the 2t rectangles
Riε :={(x,y)∈P ×Q:xi =ε,yi =1−ε},
for 1 ≤ i ≤ t and ε ∈ {0,1}. This canonical cover is denoted by Ccan(P,Q). If P = f−1(1) and Q = f−1(0), it is denoted by Ccan(f). If f is a monotone Boolean function, then for any x with f(x) = 1 and y with f(y)=0, we can find an index i such that xi =1 and yi =0. Hence, for monotone f, it suffices to consider the t canonical rectangles with ε = 1. Such monotone canonical covers are denoted Cmon(P,Q) and Cmon(f) analogous to the above notation.
The fundamental application of rectangle covers to lower bounds is given by the following result from [85].
Theorem 6.9. For any Boolean function f : {0, 1}t → {0, 1} and any P ⊆ f−1(1), Q ⊆ f−1(0), the formula size complexity L(f) (in the AND, OR, NOT basis) of f is lower bounded as follows
L(f) ≥ α(Ccan(f)) ≥ α(Ccan(P,Q)).
For monotone f, the monotone formula complexity Lmon(f) in the
AND-OR basis is lower bounded as
Lmon(f) ≥ α(Cmon(f)) ≥ α(Cmon(P,Q)).
We remark that the basic intuition for this connection goes back to Khrapchenko and Rychkov (see, [85]). A similar approach was taken by Karchmer and Wigderson [40] in their characterization of circuit depth in terms of communication complexity (recall that circuit depth is logarithmic in formula size).
The ideas in the proof of Theorem 6.9 generalize to the model of graph complexity. For this, let G ⊆ X × Y be a bipartite graph. Recall that LB(G) denotes the graph-formula complexity of G using operations ∪ and ∩ and complete bipartite graphs (B) as generators. Let us look at the framework of rectangle covers in this context. To begin with, let

6.2 Projective and Affine Dimensions of Graphs 123
P ⊆G be a subset of edges of G and let Q⊆(X ×Y)\G be a subset of nonedges of G. We can then consider the rectangle covers of P × Q (or more generally those of P′ × Q′ for any disjoint P′,Q′ ⊆ X × Y ). For any X0 ⊆ X, let B(X0) denote the complete bipartite graph X0 × Y ; similarly for any Y0 ⊆ Y . The canonical cover of P × Q, denoted Cgr(P,Q), is defined by the sets of rectangles:
Cgr,X := {P ∩B(X0)×Q∩B(X0) : X0 ⊆X}, Cgr,Y := {P ∩B(Y0)×Q∩B(Y0) : Y0 ⊆Y}.
When P =G and Q=(X ×Y)\G, we denote this canonical cover by Cgr (G).
The generalization of Theorem 6.9 to graph complexity is as follows.
Theorem 6.10. For any bipartite graph G ⊆ X × Y and P ⊆ G and Q ⊆ (X × Y )\G, LB(G) ≥ α(Cgr(P,Q)).
Consider the partial matrix A := A(F,a1,a2,G) over F, where a1 ̸= a2 are nonzero elements of F for the cover Cgr(G).
Given an edge p∈G and a non-edge q∈(X ×Y)\G, we have
 a1
Apq =a2 ∗
if p and q share a vertex in X, if p and q share a vertex in Y, otherwise.
(Note that an (edge, nonedge) pair covered by a rectangle in Cgr,X cannot share a common vertex in X and similarly Cgr,Y .)
Using Lemma 6.8 for the cover Cgr(G) with this partial matrix and Theorem 6.10, we get the following.
Lemma 6.11. For any bipartite graph G ⊆ X × Y , LB(G) ≥ rank(A(F, a1 , a2 , G)).
6.2.2 Affine Dimension
We can now return to the main motivation for studying the affine dimension of graphs [85].

124 Graph Complexity and Projective and Affine Dimensions of Graphs Theorem 6.12. For any graph G ⊆ X × Y and any field F, LB(G) ≥
adimF (G).
Proof. We will give an affine representation for G using Lemma 6.11 in dimension d = rank(A(F,a1,a2,G)). Let B be a full matrix of rank d over F such that (usual) rank(B) = rank(A). Let the row space of B be the ambient space (∼= Fd). Recall that the rows of B are indexed by edges of G. With each edge e ∈ G, associate the corresponding row vector ve. Now, represent each vertex z ∈ X ∪ Y by affine subspace Uz given by affine span of edges incident to z:
Uz := affspan⟨ve : e is incident to z⟩.
Clearly, if x and y are adjacent in G, the affine subspaces Ux and Uy have the vector vxy in them. On the other hand, suppose x and y are not adjacent. For any edge e incident to x, e and xy share the vertex x ∈ X. Hence, for every such edge the corresponding row has a1 in the column corresponding to the nonedge xy. Thus, all affine linear combinations of these edges will also have a1 in that coordinate, i.e., every vector in Ux has a1 in the coordinate indexed by the column xy. Similarly, every vector in the affine subspace Uy has a2 in that coordinate since the nonedge xy and all edges incident to y share a vertex in Y . This show that Ux ∩ Uy = ∅.
The following theorem from [85] states the known the relations between adimF and pdimF.
Theorem 6.13. For any graph G,
• over any field F, adimF(G) ≤ (pdimF(G))2.
• over any finite field F, pdimF(G) ≤ (adimF(G))O(adimF(G)).
It is easy to see that, when G is the complement of a matching, adimR(G) = 2. Combined with Lemma 6.6, this shows that no general relation between adimR and pdimR can exist.

6.3 Lower Bounds on Depth-3 Graph Complexity 125 6.3 Lower Bounds on Depth-3 Graph Complexity
In this subsection, we consider a restricted model of graph complexity,
namely, depth-3 formulas. We will derive a nontrivial lower bound of
Ω(log3 n/(log log n)5) on the depth-3 complexity of Paley-type bipartite
graphs [58]. The proof uses the lower bound on l2-rigidity (Lemma 3.5)
and approximating polynomial representations of the OR function.
Improving this to nΩ(1) is a big challenge — such a bound would give
superlinear lower bounds on log-depth Boolean circuits and a language
outside the class Σcc in two-party communication complexity — these 2
are two long-standing open questions.
Suppose a depth-3 bipartite-formula computes a bipartite graph
G⊆U ×V, |U|=|V|=n. Recall that the leaves of the formula are graphs from B={A×V :A⊆U} ∪ {U×B:B⊆V}. Let us first observe that the bottom gates of a bipartite-formula need not have fan- in more than 2. Indeed, an ∩ gate at the bottom computes a complete bipartite graph A × B and a ∪ bottom gate computes the complement ofacompletebipartitegraphA×B,whereA⊆U andB⊆V.These can be written as intersection and union, respectively, of at most two graphs from B.
Without loss of generality, we consider ∪ ∩ ∪ formulas. By the remark above, we can write such a formula as G = ∪i ∩j Gij , where Gij is the complement of a complete bipartite graph, i.e., Aij × Bij for some Aij ⊆U and Bij ⊆V.
One ingredient of our proof is sign-representing polynomials for DNF’s. Nisan and Szegedy [66] give the following construction of ε- approximating polynomials for the or-function. They assume a con- stant ε. The refined analysis to bring out the dependence on ε is due to Hayes and Kutin (private communication). For a proof, see [58].
Lemma 6.14. The or-function of n Boolean variables can be ε- approximated by a real polynomial of degree at most O(√nlog(2/ε)). More precisely, for every 0<ε<1/2, there is a real polynomial p of degree at most O(√nlog(2/ε)) such that for every x∈{0,1}n, |or(x) − p(x)| ≤ ε. 126 Graph Complexity and Projective and Affine Dimensions of Graphs ForabipartitegraphG⊆U ×V,wewillletG(x,y)=1if(x,y)∈G and G(x,y) = 0 if (x,y) ̸∈ G. Lemma 6.15. Suppose an n × n bipartite graph H ⊆ U × V is writ- ten as a union of d complete bipartite graphs: 􏱂d (Ai ×Bi), where Ai ⊆U, Bi ⊆V. Then, for every ε, where 0 < ε < 1/2, there is a real matrix MH such H = i=1 that • For all (x,y)∈U ×V, |MH(x,y)−H(x,y)|≤ε, √ • rank(MH ) ≤ exp(O( d log(2/ε) log d)). Proof. Let R be the incidence matrix of H, and similarly let Ri be the incidence matrices of the complete bipartite graphs Ai × Bi, 1 ≤ i ≤ d, covering H. Note that R is simply the entry-wise or of the Ri. Furthermore, each Ri is of rank one as a real matrix. We obtain MH from R using the approximating polynomials for the or-function given by Lemma 6.14. Suppose p(z1,...,zd) is an ε-approximating polynomial of degree k := c · √ dlog(2/ε) for or(z1,...,zd). Syntactically substitute the matrix Ri for zi in this polynomial, but interpret the product as entry-wise product of matrices, i.e., a monomial zizj is replaced by Ri ◦ Rj, where for matrices A and B, (A ◦ B)(x,y) := A(x,y)B(x,y). Note that if A and B are rank-1 matrices, then A◦B is also a rank-1 matrix. Thus, a monomial zi1 · · · zit is replaced by the rank-1 0–1 matrix Ri1 ◦···◦Rit. The matrix obtained by comput- ing the polynomial p(R1,...,Rd) in this way gives us the desired matrix MH. It is clear that MH(x,y) = p(R1(x,y),...,Rd(x,y)). From the prop- erties of p, it is easy to see that for all x,y, |MH(x,y) − H(x,y)| ≤ ε. Since MH is a linear combination of rank-1 matrices, one for each mono- mial, it follows that rank of MH is at most the number of monomials in p which is bounded by 􏰂k 􏰃d􏰄 ≤ exp(O(k log d)). j=0 j 6.3 Lower Bounds on Depth-3 Graph Complexity 127 Lemma 6.16. Let G ⊆ U × V be a bipartite graph. If G is realized by a depth-3 bipartite-formula: t di 􏱂􏱃 (Aij ×Bij), whereAij ⊆U, Bij ⊆V, then there exists a matrix M such that (1) If G(x,y) = 0, then M(x,y) ≤ −1/6. (2) If G(x,y) = 1, then M(x,y) ≥ +1/6. (3) rank(M) ≤ exp(O(√DlogtlogD)), where D = maxti=1 di. Proof. Let G1,...,Gt be the input graphs to the top gate so that G = ∪ti=1Gi. Since each Gi is an intersection of complements of complete bipartite graphs, its complement, Hi := Gi is computed by a union of complete bipartite graphs. Thus, we can apply Lemma 6.15 to the Hi. Let Mi be the real matrix given by Lemma 6.15 that εi -approximates Hi , where εi := 1/3t. Since, as a matrix, Gi =J −Hi, where J is the n×n all 1’s matrix, the matrix Ni := J − Mi is an εi-approximation of Gi. Furthermore, since rank(Ni) ≤ rank(Mi) + 1, it follows from Lemma 6.15 that rank(Ni) ≤ exp(O(√di log(1/εi)logdi)) ≤ exp(O(√DlogtlogD)). Let M := N1 + ··· + Nt − 1 · J. Let us see the relation between M 2 and G: (1) If G(x,y) = 0, then ∀i Gi(x,y) = 0, and hence ∀i|Ni(x,y)| ≤ εi. It follows that M (x, y) ≤ 􏰂ti=1 εi − 1/2 ≤ −1/6. (2) If G(x,y)=1, then ∃iGi(x,y)=1 and for this i, 1−εi ≤ G= i=1j=1 Ni(x,y)≤1+εi. For j̸=i, −εj ≤Nj(x,y)≤1+εj. Hence, we have 1−εi −􏰂 εj −1/2≤M(x,y)≤􏰂t (1+εj)− j̸=i j=1 1/2. So, in this case, 2/3−1/2≤M(x,y)≤t+1/3−1/2, and hence M(x,y) ≥ 1/6. 􏰂t √ (3) Moreover, rank(M) ≤ rank(Ni) + 1 ≤ t exp(O( D √ i=1 logDlogt)) + 1 ≤ exp(O( DlogDlogt)). 128 Graph Complexity and Projective and Affine Dimensions of Graphs Lemma 6.17. Let G be an n×n bipartite graph G⊆U ×V. If G is realized by a depth-3 bipartite-formula: t di 􏱂􏱃 (Aij ×Bij), whereAij ⊆U, Bij ⊆V, then there exists a matrix M such that (1) If G(x,y) = 0, then M(x,y) ≥ 1. (2) If G(x,y) = 1, then M(x,y) = 0. (3) rank(M) ≤ 􏰹ti=1 di. Proof. It will be convenient to consider the complement graph G of G. Clearly, t di 􏱃􏱂 G= i=1j=1 (Aij ×Bij). G= Gi Let Rij be the incidence matrix of the complete bipartite graph Gij. i=1j=1􏱄 􏱅􏱆 􏱇 Gij Define Mi = 􏰂di j=1 Rij. Finally, let M = M1 ◦ M2 ◦ ··· ◦ Mt. Hence, we t di have M(x,y) = (x,y) ∈ Gi then Rij(x,y). Mi(x,y) = 􏰂di Note that if (x,y) ̸∈ Gi, then Mi(x,y) = 0. Hence, we have j=1 Rij(x,y) ≥ 1 and if 􏱄 􏱅􏱆 􏱇 􏰖􏰕 i=1 j=1 (x,y) ∈ G ⇒ M(x,y) ≥ 1, (x,y) ̸∈ G ⇒ M(x,y) = 0. From this (1) and (2) follow. To see the bound on rank(M), note that rank is sub-multiplicative under ◦ : rank(A ◦ B) ≤ rank(A) · rank(B). Since rank(Rij) = 1, we have rank(Mi) ≤ di. Hence, rank(M) ≤ 􏰹ti=1 rank(Mi) ≤ 􏰹ti=1 di. This gives (3). 6.3 Lower Bounds on Depth-3 Graph Complexity 129 Theorem 6.18. Let G be an n×n bipartite graph G⊆U ×V. If G is realized by a depth-3 bipartite-formula: t di 􏱂􏱃 (1) If G(x,y) = 0, then M(x,y) ≤ −1/12. (2) If G(x,y) = 1, then M(x,y) ≥ +1/12. 􏰂 (3) rank(M) ≤ exp(O(L1/3 log5/3 L)), where L = ti=1 di. Proof. Let D∗ be a parameter to be fixed later. We will separate the formula into two parts depending on whether the middle fan-in di is smaller or larger than D∗. This idea of splitting the formula based on middle fan-in was used earlier in [44]. Specifically, let Gs be the subgraph of G realized by the subformula given by union of middle gates of fan-in at most D∗ and let Gl be the subgraph of G similarly given by middle gates of fan-in larger than D∗: We apply Lemma 6.16 to Gs to get a matrix Ms such that Ms(x,y) ≥ 1/6 if (x,y) ∈ Gs and Ms(x,y) ≤ −1/6 if (x,y) ̸∈ Gs. Furthermore, we have rank(Ms)≤exp(O(√D∗logD∗logt)) since all middle fan-in’s of the formula for Gs are at most D∗ and the top fan-in is at most t. We apply Lemma 6.17 to Gl and get a matrix Ml such that Ml(x,y) ≥ 1 if (x,y) ̸∈ Gl and Ml(x,y) = 0 if (x,y) ∈ Gl. Since all mid- dle fan-in’s of the formula for Gl are at least D∗, the top fan-in tl of this formula is at most tl ≤ 􏰂ti=1 di/D∗ ≤ L/D∗. We bound rank of Ml G= then there exists a matrix M such that i=1j=1 (Aij ×Bij), whereAij ⊆U, Bij ⊆V, Gs = Gl = 􏱂 􏱃i di≤D∗ j=1 (Aij ×Bij), (Aij ×Bij). 􏱂 􏱃i di>D∗ j=1
d
d

130 Graph Complexity and Projective and Affine Dimensions of Graphs as follows:
rank(Ml) ≤
≤􏰕 di
􏰖
di  tl
di>D∗, 1≤i≤t
di>D∗ tl
≤ exp(O(tl logD)),
􏰕 di t
≤ L
where D = 􏰈􏰈L 􏰉􏰉 di>D∗l
≤exp O D∗logD .
Define M = Ms ◦ Ml + (1/12)J. If (x,y) ∈ Gl, then M(x,y) = 1/12, and if (x,y) ∈ Gs\Gl, then M(x,y) ≥ 1/6 + 1/12. If (x,y) ̸∈ Gs ∪ Gl, then M(x,y) ≤ −1/6 + 1/12 ≤ −1/12. Hence, (1) and (2) are satisfied by M.
By sub-multiplicativity of rank under ◦, we get the upper bound on
rank of M:
rank(M) ≤ exp O( D∗ logD∗ logt) + O D∗ logD + 1.
􏰈√ 􏰈L􏰉􏰉
We set D∗ = Θ((L/logL)2/3). Using the trivial upper bound of O(logL) on logD∗, logD, and logt, we get that rank(M)= exp(O(L1/3(logL)5/3)), verifying (3).
We now use Forster’s Theorem 4.2 to show that for some “interest- ing” graphs G any matrix satisfying (1) and (2) of Theorem 6.18 must have a large rank and hence conclude a lower bound on the depth-3 complexity of G using (3).
Theorem 6.19. Let G be an n × n bipartite graph and AG be its ±1 incidence matrix, i.e., AG(x,y) = 1 if (x,y) is an edge of G and AG(x,y) = −1 if (x,y) is not an edge of G. Then any depth-3 bipartite- formula for G must have size at least
􏰈 log3(n/∥AG∥) 􏰉 Ω log log5(n/∥AG∥) .

6.3 Lower Bounds on Depth-3 Graph Complexity 131
Proof. Given a depth-3 formula for G of size L, let M be the matrix given by Theorem 6.18. Note that if (x,y) ̸∈ G, then M(x,y) ≤ −1/12, and if (x,y) ∈ G, then M(x,y) ≥ 1/12. Hence, 12M is a sign-preserving variation of AG and we can apply Theorem 4.2 to conclude that rank(M) = rank(12M) = Ω(n/∥AG∥). On the other hand, from Theo- rem 6.18, (3), we get that rank(M ) ≤ exp(O(L1/3 log5/3 L)). Combining the two estimates on rank(M), we get
exp(O(L1/3 log5/3 L)) = n . ∥AG ∥
Solving for L proves the theorem.
Corollary 6.20. For any graph G such that AG is an Hadamard matrix, the depth-3 bipartite-formula complexity of G is at least Ω((logn)3/(loglogn)5). An example of such a graph is the Paley-type bipartite graph.

7
Span Programs: A Linear Algebraic Model of Computation
The model of span programs was introduced by Karchmer and Wigderson [41]. A span program over a field F computes a Boolean function f : {0, 1}n → {0, 1} as follows. With each of the 2n literals xi (xi, respectively), a span program associates a subspace Ui1 (Ui0) of an ambient space V over F. There is no condition on the dimensions of these subspaces. Fix a nonzero vector z ∈ V . Now, the condition is that
f(x) = 1 if and only if z ∈ span{Ux1,Ux2,…,Uxn}, 12n
i.e., an input assignment “picks” either Ui1 or Ui0 depending on whether
it sets xi to 1 or 0, respectively, and the function evaluates to 1 on that
assignment if and only if the special vector z falls in the linear span
of the picked subspaces. W.l.o.g, we let z to be the all 1’s vector 1 in
what follows. The dimension of the span program is the dimension of
the ambient space V . Define now sdimF(f) to be the smallest dimension
d of an ambient vector space V in which a span program as described
above can be realized for computing the function f. The size of the span
program is the sum of the dimensions of the 2n subspaces associated to
the literals: 􏰂 dimUε. The span program size SPF(f) of a function f i,ε i
is the minimum size of a span program computing the function f. 132

7.1 Characterizing Span Program Size 133
In matrix language, a span program for a Boolean function f is a
matrix S whose rows span the ambient space V . Each row is labeled by
a unique literal (the same vector could repeat as multiple rows). Each
subspace Uiε is then specified by a subset of rows and the number of
rows in S is the size of the span program. For a given input assignment
x ∈ {0,1}n, let Sx denote the submatrix given by the rows “activated”
by x, i.e., for each i, take the set of rows corresponding to Uxi. Then, i
f(x) = 1 if and only if 1 is in the row space of Sx, i.e., there exists a vector cx such that cTx Sx = 1T .
A monotone span program has subspaces associated only to the n positive literals xi. The criterion for when a monotone span program computes a monotone Boolean function is defined similarly as above. Let mSPF(f) denote the size of a monotone span program that com- putes the monotone function f.
Using Warren’s inequality [102], Babai et al. [6] prove that for almost all f, sdimF(f) is at least Ω(2n1/3/(nlogn)1/3). Proving superpolyno- mial lower bounds for explicit f is a major challenge. Span programs provide a “static” model of computation for a Boolean function and this model promises to be a vehicle for realizing the approach of the fusion method [103] for circuit lower bounds. Span programs are well- related to other models of complexity such as Boolean formulas, sym- metric branching programs, algebraic proof systems, and secret sharing schemes.
A major achievement is a sequence of results [6, 9, 32] culminating in Ga ́l’s proof of an nΩ(logn) lower bound on monotone span program size of several explicit Boolean functions. These results are highly non- trivial applications of combinatorial and algebraic techniques. Ga ́l [32] actually provides a characterization of span program size and uses a gap between rectangle cover numbers (cf. Section 6.2.1) and ranks to get the lower bound.
7.1 Characterizing Span Program Size
Recall the notion of a rectangle cover of a cartesian product P × Q from Section 6.2.1. Here, we will use a parameter that generalizes α(C ).

134 Span Programs: A Linear Algebraic Model of Computation
Definition 7.1. Let a rectangle cover C of P × Q and a field F be given. A set K of rank-1 matrices (with entries indexed by P × Q) over F is said to be embedded in C if for every K ∈K, the set of all nonzero entries of K are contained in some rectangle R ∈ C. We then define
􏰀
􏰕
αF(C) := min
Here, J denotes the all 1’s matrix over F indexed by P × Q.
Clearly, α(C) ≥ αF(C) : for a disjoint rectangle cover C′ embedded in C, simply take for each R ∈ C′ a rank-1 matrix whose entries take the value 1 inside R and 0 outside R. An analog of Theorem 6.9 char- acterizes the span program size completely.
Theorem 7.1. Fix a field F. For every Boolean function f, SPF(f) = αF(Ccan(f)). For every monotone Boolean function f, mSPF(f)= αF (Cmon (f )).
To prove this theorem, we make use of the following lemma. Lemma 7.2. A Boolean function f : {0, 1}n → {0, 1} has a span pro-
gram of size s over F if and only if there exist sets of vectors B=􏰷bx∈Fs : x∈f−1(0)􏰸, and C=􏰷cy∈Fs : y∈f−1(1)􏰸,
such that
(1) The set [s] of coordinates can be partitioned into subsets siε
corresponding to the 2n literals, 1 ≤ i ≤ n, ε ∈ {0,1}, such
that the following holds: Every bx ∈ B can partitioned into 2n
“segments” (or subvectors) biε, where biε is the restriction of xx
bx to siε, such that the segments activated by x are identically
zero vectors, i.e., bixi = 0 for all i, 1 ≤ i ≤ n. Similarly, every x
K =J and K is a set of rank-1 matrices 􏱁
|K| :
over F embedded in C .
K∈K

7.1 Characterizing Span Program Size 135 c ∈ C can be partitioned into ciε such that, w.o.l.g., we can
yy
assume that the segments not activated by y are identically zero, i.e, ci,1−yi =0 for all i,1≤i≤n.
y
(2) Foreveryb∈B,c∈C,cTb=1.
Proof. The if-direction is trivial: let S be the matrix with vectors from
B as columns. It is easy to verify that S is a valid span program for
f and its size is clearly s. Consider the submatrix of S given by the
segments biε for all x ∈ f−1(0). The rows of this submatrix generate Uε xi
of dimension |siε| = diε.
For the other direction, let S be any span program computing f
and of size s. Then, [41, Theorem 6] shows that S can be turned into a canonical span program S′ of the same size. In a canonical span program, there is a one-to-one correspondence between the zeros of the function and the columns of the span program and for every zero of the function, the corresponding column has zeros in all rows activated by that input. Let B be the set of column vectors of S′. Since S′ is canonical they satisfy (1). By definition, for every y ∈ f−1(1), there exists a vector cy such that cTy Sy′ = 1. Taking C to be the set of vectors cy, we have (2). Since the only rows of S′ that participate to generate 1 on an input y ∈ f−1(1) are those activated by y, we can assume, w.l.o.g., that the restriction of cy to the remaining coordinates is zero. Thus, cy can also be taken to satisfy (1).
Proof. (of Theorem 7.1) By the lemma, it suffices to show that α := αF(Ccan(f)) is the smallest s such that sets of vectors B and C as in the lemma exist.
First, we show that α≤s: Fix i, 1≤i≤n, and ε∈{0,1}. Let B be the d × |f−1(0)| matrix with biε as columns. Let C be the
iεiε x iε
|f−1(1)| × d matrix with rows ciε. Now, define K := C B . Note iε y iεiεiε
that rank(Kiε) ≤ diε. We can write Kiε as a sum of at most diε rank-1
matrices over F. Define now K to be the set of all such rank-1 matri-
ces for all i and ε. Clearly, |K|≤􏰂 rank(Kiε)≤􏰂 diε =s. We now i,ε iε
argue that K is indeed a set of rank-1 matrices embedded in Ccan(f)
and that 􏰂 K = J. This will show that α ≤ s. We claim that all K∈K

136 Span Programs: A Linear Algebraic Model of Computation
nonzero entries of Kiε are contained in the rectangle Ri,ε ∈ Ccan(f).
By property (1) in Lemma 7.2 for biε, we see that for any x such x
that xi = ε, the corresponding columns in Biε are the all-0 columns.
Hence, all columns in Kiε indexed by x ∈ f−1(0) with xi = ε are all-0
columns. Also by (1) applied to cy for y ∈ f−1(1), we see that the rows
of Ciε indexed by y with yi = 1 − ε are all-0 rows. Thus, any nonzero
entry Kiε(y,x) must have yi = ε and xi = 1 − ε, i.e., such an entry is 􏰂􏰂
in Ri,ε. Finally, the (y,x) entry of K∈K K is given by iε Kiε(x,y) = 􏰂 (ciε)T biε = cT b = 1 by property (2) in the lemma. This shows that
􏰂i,εy x yx K∈K K = J.
Next, we show that s ≤ α. Suppose K is a set of rank-1 matri-
ces embedded in Ccan(f) such that |K| = α. Now, collect together the
rank-1 matrices contained in Riε ∈ Ccan (resolve ambiguities arbitrar-
ily, but uniquely) and add them up to form Kiε. Clearly, rank(Kiε)
is at most the number of these rank-1 matrices. Write Kiε = CiεBiε,
where C (B ) has rank(K ) columns (rows). Define biε (ciε) to be the iε iε iε x y
columns (rows) of Biε (Ciε). Now, define bx (cy) as the concatenation of biε (ciε) for all i and ε. Since nonzero entries of K are contained
xy iε
in R , it is easy to ensure (1) in the lemma for bi,ε and ci,ε. Since iε xy
􏰂 K=Jimpliesthat􏰂 K =Jandhence􏰂 (ciε)Tbiε=1for K∈K iε iε iε y x
all (y,x) ∈ f−1(1) × f−1(0). It follows that cTy bx = 1 as required in (2).
We thus have vectors bx and cy satisfying (1) and (2) in dimension at
most 􏰂 rank(Kiε) ≤ |K| and hence s ≤ α. iε
The claim for monotone Boolean functions is proved similarly by considering Cmon and observing that an analog of Lemma 7.2 holds for a characterization of monotone span programs computing monotone Boolean functions.
7.2 Explicit Lower Bounds on Monotone Span Program Size
Recall that for a matrix A over (indexed by) P × Q and a rectangle R ⊆ P × Q, AR denotes the restriction of A to the entries indexed by R. Analogous to Lemma 6.7, we have the following lower bound for αF in terms of ranks.

7.2 Explicit Lower Bounds on Monotone Span Program Size 137 Lemma 7.3. For a rectangle cover C and any matrix A over P × Q
over the field F,
αF(C) ≥ rank(A) . maxR∈C rank(AR)
In particular, if A is such that AR is monochromatic (i.e., all nonzero entries of AR are equal) for each R ∈ C, then αF(C) ≥ rank(A).
The proof of this lemma is easy and similar to that of Lemma 6.7.
Thus, if we have a matrix A indexed by f−1(0) × f−1(1) (or more generally by P × Q, where P ⊆ f−1(0) and Q ⊆ f−1(1)) such that rankF(A) is superpolynomial in the number of variables and every sub- matrix AR for each rectangle R ∈ Ccan(f) is monochromatic, we will have a superpolynomial lower bound on SPF(f). Analogous statement holds for mSPF(f) for a monotone f by considering Cmon(f). We will exploit an interesting connection in the opposite direction. That is, if there is an explicit matrix A over F such that rank(A) is superpoly- nomially large compared to the minimum number of monochromatic submatrices of A needed to cover all entries of A, then we can get an explicit function f with mSPF(f) superpolynomial. This reverse con- nection was already observed by Razborov [85] in the context of lower bounds on formula size. The connection also exists for general, i.e., not necessarily monotone, lower bounds by considering the so-called “self-complementary” covers. But, so far, only the monotone case was successfully exploited for explicit lower bounds. Razborov used this connection to obtain a superpolynomial lower bound on the monotone formula size of an explicit Boolean function from a Boolean function whose nondeterministic and co-nondeterministic communication com- plexity is almost quadratically smaller compared to its deterministic communication complexity. Ga ́l used the same approach to obtain a monotone Boolean function with a superpolynomial lower bound on its monotone span program size.
Let us make the above connection more precise. Suppose A is a matrix with entries indexed by P × Q and with a monochromatic rect- angle cover of size t, i.e., A1,…,At are monochromatic submatrices of

138 Span Programs: A Linear Algebraic Model of Computation
A such that every entry (p,q) ∈ P × Q is in some Ai. Let R1,…,Rt be the rectangles in P × Q defining the nonzero entries of A1,…,At. We define a monotone Boolean function f : {0, 1}t → {0, 1} such that αF(Cmon(f)) ≥ rank(A). Associate a variable xi with each rectangle Ri for 1≤i≤t. Each column q (an element of Q⊆f−1(1)) will be a minterm and each row p (an element of P ⊆ f−1(0)) will be a max- term of f. Recall that a minterm of a monotone Boolean function f is a minimal subset of variables such that fixing each of them to 1 will ensure that the function value will be 1 no matter what we assign to the other variables. Similarly, a maxterm of a monotone Boolean function f is a minimal subset of variables such that fixing each of them to 0 will ensure that the function value will be 0 no matter what we assign to the other variables. (Geometrically, a minterm (maxterm) is a maximal subcube of the cube {0,1}t on which the function is constantly 1 (0).) Note that a monotone Boolean function is completely specified by its set of minterms or maxterms. Now, a minterm q is the set of all vari- ables corresponding to the rectangles that intersect column q. Similarly, a maxterm p is the set of all variables corresponding to the rectangles that intersect row p. Next, extend each minterm q to an assignment, also denoted q, in {0,1}t by assigning 1 to all variables in the minterm and 0 to all the others. Similarly, extend each maxterm p to an assign- ment p ∈ {0, 1}t by assigning 0 to all variables in the maxterm and 1 to others. This way, we get P ⊆ f−1(0) and Q ∈ f−1(1). It is now easy to verify that R1,…,Rt form the monotone canonical cover Cmon(P,Q) (cf. Theorem 6.9 and the discussion preceding it). Furthermore, matrix A over P × Q is monochromatic restricted to each Ri. Hence, by The- orem 7.1 and Lemma 7.3, we have mSPF(f) ≥ αF(P,Q) ≥ rank(A).
We now construct the explicit matrix A that accomplishes the above
goal. The matrix A will be the 0–1 disjointness matrix over sets of size
at most s = Θ(log m) over a universe of size m: the rows and columns
of A are indexed by P = Q = subsets of [m] of cardinality ≤ s and
A(p,q)=1 if and only p∩q=∅. It is well-known [85] that A is full-
rank, i.e., rank(A) = 􏰂s 􏰃m􏰄, over any field F. We next need to show i=0 i
that there is set of O(m) monochromatic rectangles that cover all the entries of A. For this, we use an important pseudo-random property of Paley graphs.

7.2 Explicit Lower Bounds on Monotone Span Program Size 139
A Paley graph is defined over a field K such that |K| ≡ 1 (mod 4) where the vertex set is K and i∼j if and only if i−j is a quadratic residue in K. A graph G is said to have Property P􏰲k if for any two disjoint sets S and T of vertices such that |S| + |T | ≤ k, there is a vertex v in G such that v is adjacent to every vertex in S and not adjacent to any vertex in T. Theorem 13.11 from [18] shows that the Paley graph has property P􏰲k for k ≤ εlog|K| for some constant ε.
Let m := |K| and s = (ε/2)logm. We identify the vertices of the
Paley graph G on K with the universe [m]. For the matrix A, we
obtain a rectangle cover by associating one rectangle with each vertex
of G. By Property P􏰲2s of G, for any two disjoint subsets p and q of
vertices of size at most s, there exists a vertex v of G such that every
vertex in p is adjacent to v and no vertex in q is adjacent to x. It
follows that the matrix A has a rectangle cover of size m while it has
a rank of 􏰂s 􏰃m􏰄=mΘ(logm). We can use A to define an explicit i=0 i
Boolean function on m variables with monotone span program size mΘ(log m) .
Definition of f : Given a Paley graph G on m vertices, we let f : {0, 1}m → {0.1} as follows. The m variables of f are identified with the vertices of G. For each subset p of at most s vertices f has a minterm and is given by all vertices of G (variables of f) that are adjacent to all vertices of p. Similarly, for each subset q of at most s vertices f has a maxterm and is given by all vertices of G (variables of f) that are not adjacent to any vertex of q.
The following theorem follows from the foregoing arguments.
Theorem 7.4. For the function f defined above and any field F, mSPF(f) = mΩ(logm).
It is known [10] that span programs become exponentially weak if we restrict them to be monotone. On the other hand, there exist functions with linear size monotone span programs but requiring exponential size monotone Boolean circuits [6]. Another interesting result in [10] is the exponential separation of monotone span programs over fields of different characteristics.

140 Span Programs: A Linear Algebraic Model of Computation
The characterization of Ga ́l of span program size in terms of gaps between combinatorial cover numbers and ranks seems to be a very promising criterion for lower bounds. It has already been extensively used in [10]. These results also bring together the theme that com- munication complexity techniques provide very powerful approaches to problems in lower bounds.

8
Conclusions and Open Problems
In this last section, we make some concluding remarks and mention several open problems.
Rigid matrices over small number fields: Proving strong superlinear
lower bounds on the rigidity of explicit matrices (Open Question 2.1)
remains an intriguing open problem. On the one hand, intuitive criteria,
such as having all submatrices nonsingular, are insufficient as shown in
Theorem 2.12. On the other hand, using algebraic independence of
very high-degree among entries of matrices such as (√pij), we were
able to prove quadratic lower bounds on their rigidity (Corollary 2.19
and Theorem 2.21). These lower bounds are not satisfactory since the
entries of those matrices live in number fields of exponentially large
dimensions. It would be interesting to prove RA(εn) ≥ n1+δ for A ∈
Cn×n, where entries aij of A span a number field of poly(n) dimension,
i.e., dim(Q(aij) : Q) ≤ poly(n). Note that the Fourier transform matrix
F = (ωij )n−1 defines a number field of dimension φ(n) ≤ n − 1 [50, i,j =0
Ch. VI, Theorem 3.1].
Rigidity of Vandermonde matrices: Note also that the Fourier trans- form matrix is a Vandermonde matrix. However, we do not know strong
141

142 Conclusions and Open Problems
lower bounds on the rigidity of even a generic Vandermonde matrix, i.e.,
V = (xj−1)1≤i,j≤n, where the xi are algebraically independent over Q. i
Can we prove (or disprove) that RV (εn) ≥ n1+δ ? Theorem 2.17 gives
an Ω(n2) lower bound, but only when r = O(√n). For an arbitrary
Vandermonde matrix, we only know a lower bound of Ω(n2/r) (The-
orem 2.10) and only a slightly better bound of Ω(n2 logn) for specific rr
Vandermonde matrices, e.g., when the xi are distinct positive integers and when they are powers of a primitive root of unity of a prime order (Theorem 2.8 and the discussion following it).
Rigidity of character tables: The Fourier transform matrix and the Sylvester-type Hadamard matrix (defined by (2.3)) are conjectured to have high-rigidity. They are the character tables of the groups Zn and Zt2, respectively. We pose the following more general question: Given a finite abelian group G of order n, let AG ∈ Cn×n be the character table of G. Prove that RAG (εn) ≥ n1+δ for some group G. For the character table of any finite abelian group, a lower bound of Ω(n2/r) follows from Theorem 3.8 using spectral techniques.
Paturi–Pudla ́k dimensions over infinite fields: The techniques from Section 2.6 currently apply to finite fields. Proving nontrivial bounds on the inner and outer dimensions over infinite fields may be an interesting extension to [70].
Spectral techniques: In Section 3, we looked at several rank robust- ness functions expressed in terms of l2-norm. Spectral techniques have been helpful in proving lower bounds on such normed variants of rigid- ity. We have also seen that such variants, and spectral techniques in general, have been very successful in proving lower bounds on bounded coefficient complexity of linear and bilinear circuits. Notable among these are the Ω(nlogn) lower bound for the Fourier transform and the Ω(n2 log n) lower bound for matrix multiplication.
Sign-rank versus margin complexity: Spectral techniques are also used in Section 4 to prove lower bounds on sign-rank and margin complexity; note that both lower bounds (Theorems 4.2 and 4.7) are √mn/∥A∥. While the lower bound on margin complexity of an Hadamard matrix is tight, we do not know if the lower bound on its sign-rank is tight.

143
From Lemma 4.17, sign-rank can be at most quadratically (ignoring log-factors) larger than margin complexity. Recall that [2] show that for almost all sign matrices, sign-rank is Θ(n). Hence, a random matrix shows that this gap is tight apart from log-factors. It would thus be interesting to find an explicit sign matrix with a sign-rank of Ω(n).
Connections between complexity measures of sign matrices and com- munication complexity: Consider the sequence of inequalities1
􏰛sign-rank(A) 􏱊 mc(A) ≤ γ1/2ε(A) ≤ 􏰛rank(A), (8.1) 2
valid for any matrix A ∈ {−1, +1}n×n and compare it to the hierarchy of communication complexity classes:
UPP ⊇ PPcc ⊇ BPPcc ⊇ Pcc. (8.2)
Recall that sign-rank(A) and mc(A) (and hence disc(A)) character-
ize the communication complexity classes UPP and PPcc, respec-
tively. More specifically, if these functions are bounded above by
exp(polylog log(n)) for an infinite family of matrices {An}, then the
language defined by {An} is in the corresponding communication com-
plexity classes. The next two parameters γ1/2ε(A) and rank(A) are 2
only known to give lower bounds on bounded error probabilistic com-
munication complexity (Theorem 5.8) and deterministic communica-
tion complexity, respectively. It would be very interesting to prove
(or disprove) that the parameters γ1/2ε and rank can also be used to 2
characterize the corresponding communication complexity classes. It would also be interesting to investigate if γ2-norm can be used to char- acterize quantum communication complexity. Note that proving that polylog(rank(A)) is an upper bound on deterministic communication complexity is precisely the log-rank conjecture.
Gaps between log-rank and communication complexity: The log-rank conjecture is one of the basic open questions in complexity theory. It would be interesting to improve the gap between log-rank and communication complexity given by Theorem 5.7, for instance, to quadratic by constructing an n-variable Boolean function of degree
1 We use 􏱊 instead of ≤ to ignore factors up to log n.

144 Conclusions and Open Problems
O(√n) and sensitivity Ω(n). More generally, is it possible to construct Boolean matrices of small rank r (say, r = exp((log n)O(1))) in which every monochromatic rectangle is small (say, containing at most an exp(−(logr)ω(1))-fraction of entries)? Note that if, for some constant c > 0, every rank-r Boolean matrix contains a monochromatic subma- trix with at least an exp(−(logr)c)-fraction of entries, then the log-rank conjecture holds [68].
Lower bounds for higher classes in communication complexity :
Regarding the “higher” complexity classes in the two-party commu-
nication model, separating PHcc from PSPACEcc is a long-standing
open question [5]. The approaches via matrix rigidity or margin com-
plexity rigidity from Section 5.4 seem to demand extremely strong
lower bounds on those parameters. However, it is also a challeng-
ing open question to find an explicit language outside Σcc; we know 2
explicit languages outside several classes smaller than Σcc such as 2
Pcc,NPcc,Co–NPcc, and BPPcc. We also know explicit languages, e.g.,
inner product mod-2, outside PPcc and UPP, thanks to their charac-
terizations in terms of margin complexity and sign-rank. Recall also
the recent breakthrough result of [88] that Σcc ̸⊆ UPP (Theorem 5.20). 2
Lower bounds on dimensions of graphs: Proving strong lower bounds,
i.e., at least 2log log nω(1) , on depth-3 graph complexity for explicit n × n
bipartite graphs is one approach to finding explicit languages outside
Σcc. An nΩ(1) lower bound on depth-3 graph complexity will have an 2
even more amazing consequence; it would imply a superlinear lower bound on log-depth Boolean circuits. Unfortunately, we currently know onlyΩ ̃(log3n)lowerbounds(Theorem6.19).
Proving superlogarithmic lower bounds on the affine and projec- tive dimensions of explicit graphs is a challenging open problem. For instance, what are the projective and affine dimensions of graphs such as the Paley graph or explicit Ramsey-like graphs? It is interesting that depth-3 graph complexity can also be related to certain geometric representations of graphs. Indeed, Theorem 6.19 shows that a depth-3 lower bound can be derived from a lower bound on the dimension d for the following kind of geometric representation: associate vectors ux,vy ∈Rd such that uTxvy ≥δ if (x,y) is an edge and uTxvy ≤−δ if

145
(x,y) is not an edge, where δ > 0 is a constant. As seen in the proof of Theorem 6.19, a lower bound of d = Ω(n) for the Paley graph follows from a lower bound on the sign-rank of an Hadamard matrix.
Span program complexity: The parameters α(C) (Section 6.2.1) and αF(C) (Definition 7.1) have lower bounds in terms of certain rank-like functions of (partial) matrices (Lemmas 6.7 and 7.3). When applying this connection to prove lower bounds on monotone span program size, we used an explicit matrix A such that rank(A) is quasi-polynomially large compared to the size of a monochromatic rectangle cover of A (see, Section 7.2). In contrast, recall that, to prove the log-rank conjecture in two-party communication complexity, we need to show that every Boolean matrix can be covered by quasi-polynomially many monochro- matic rectangles compared to its rank.
Computational complexity of rank robustness functions: We observe here a correlation between the complexity of computing various rank robustness functions and the difficulty of proving lower bounds on those functions for explicit matrices. This is reminiscent of the theory of nat- ural proofs [87] in the context of lower bounds for explicit Boolean func- tions. For concreteness, let us consider the three functions: (i) Valiant’s general rigidity RA(r) in Definition 2.1, (ii) Raz’s geometric rigidity in Definition 3.4, and (iii) l2-rigidity in Definition 3.2. As we will explain below, functions (i)–(iii) are apparently of decreasing computational complexity and, interestingly, proving explicit lower bounds on these functions also appears to be of decreasing difficulty. Moreover, the cor- responding circuit lower bounds that can be derived also appear to be of decreasing strength and generality as we go from (i) to (iii).
(1) Given a matrix A (say, over a finite field) and an integer r, computing RA(r) appears to be an intractable question. In fact, Deshpande (private communication) showed the follow- ing problem to be NP-complete: Given a matrix A ∈ Fqm×n, and integers 0≤w≤n2 and 0≤r≤n, does there exist a matrix C ∈ Fqm×n with no more than w nonzero entries, such that rank(A + C) ≤ r? The reduction is from the Nearest Codeword Problem. Alekhnovich [1] shows an even stronger

146
Conclusions and Open Problems
hardness result based on a conjecture about the average hard- ness of maximum satisfying linear subsystem in a system of linear equations over GF(2): it is infeasible to distinguish between matrices M with RM (εn) < n1+δ and random M (hence of quadratic rigidity). It follows that having high rigidity is unlikely to be an easy property of a matrix. This may partly explain why coming up with explicit matrices of high rigidity remains a challenging open problem. (2) A notion essentially equivalent to Raz’s geometric rigid- ity (Definition 3.4) is studied in computational geometry under the name outer radius of a set of points (see, e.g., [100]). In [100] and other related works, the complexity of computing outer radius has been investigated. From these results, we obtain the following: (a) For any r ≤ n, Rigr(A) can be approximated in polynomial time within a factor of O( log m). When r is a constant, Rigr (A) can be approx- imated within a factor of (1 + ε) for any constant ε > 0. (b) For any r≤nε, where 0<ε<1, Rigr(A) cannot be approximated within a factor of (log m)δ , where δ > 0 is a fixed constant, unless NP ⊆ P ̃. Furthermore, it is NP-hard to approximate Rign−1(A) within any constant factor. It follows that Rigr(A) is in general hard to compute but seems to be of varying difficulty to approximate. We also saw an explicit matrix, namely, a generalized Hadamard matrix H, that has Rigr(H) ≥ √n − r (Corollary 3.12). However, for Raz’s lower bound on matrix multiplication, we only need that a random matrix Y has Rigεn(Y ) = Ω(√n) (Lemma 3.20(2)).
(3) Measuring the distance of a matrix from the set of low rank matrices in l2-distance as opposed to Hamming dis- tance seems to make the problem considerably easier. This is illustrated by the complete characterization of l2-Rigr(A) in terms of singular values of A (Lemma 3.5). As a result of this characterization, we have both a polynomial time algorithm to compute l2-Rigr(A) and explicit matrices with strong lower bounds, e.g., Hadamard matrix, on this rank robustness function.

147
Understanding the computational complexity of other rank robust- ness functions is an interesting and emerging area of research. In partic- ular, it would be interesting to prove hardness results (or find efficient algorithms) for Paturi-Pudla ́k dimensions.
Linial et al. [53], formulate margin and γ2-norm in terms of semi- definite programs. Thus, they are computable in polynomial time. How- ever, the computational complexity of the sign-rank of a matrix is still unknown. Very recently, Kushilevitz and Weinreb [49] studied the complexity of computing the communication complexity of a Boolean matrix and showed that, under certain cryptographic assumptions, it is intractable to compute, or even to approximate well, the communi- cation complexity.

Acknowledgments
I am grateful to the following people for their careful reading of the early versions of this paper and valuable feedback: Amit Deshpande, Seny Kamara, Neeraj Kayal, Meena Mahajan, Jayalal Sharma, Madhu Sudan, and Ramarathnam Venkatesan.
148

References
[1] M. Alekhnovich, “More on average case vs approximation complexity,” in FOCS, pp. 298–307, IEEE Computer Society, 2003.
[2] N. Alon, P. Frankl, and V. R ̈odl, “Geometric realizations of set systems and probabilistic communication complexity,” Proceedings of the 26th IEEE Symposium on Foundations of Computer Science, pp. 277–280, 1985.
[3] R.I.ArriagaandS.Vempala,“Analgorithmictheoryoflearning:Robustcon- cepts and random projection,” IEEE Symposium on Foundations of Computer Science, pp. 616–623, 1999.
[4] L. Babai and P. Frankl, Linear Algebra Methods in Combinatorics, Prelim- inary version 2. Department of Computer Science, University of Chicago, 1992.
[5] L. Babai, P. Frankl, and J. Simon, “Complexity classes in communication complexity theory,” Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pp. 337–347, 1986.
[6] L. Babai, A. G ́al, and A. Wigderson, “Superpolynomial lower bounds for monotone span programs,” Combinatorica, vol. 19, no. 3, pp. 301–319, 1999.
[7] W. Baur, “Simplified lower bounds for polynomials with algebraic co-
efficients,” Journal of Complexity, vol. 13, pp. 38–41, 1997.
[8] W. Baur and V. Strassen, “The complexity of partial derivatives,” Theoretical
Computer Science, vol. 22, pp. 317–330, 1983.
[9] A. Beimel, A. G ́al, and M. Paterson, “Lower bounds for monotone span
programs,” Computational Complexity, vol. 6, no. 1, pp. 29–45, 1996/97.
[10] A. Beimel and E. Weinreb, “Separting the power of monotone span programs over different fields,” Proceedings of the 44th IEEE Symposium on Foundations
of Computer Science, pp. 428–437, 2003. 149

150 References
[11] S. Ben-David, N. Eiron, and H. U. Simon, “Limitations of learning via embed- dings in Euclidean half spaces,” Journal of Machine Learning Research, vol. 3, pp. 441–461, 2003.
[12] A. S. Besicovitch, “On the linear independence of fractional powers of integers,” Journal of the London Mathematical Society, vol. 15, pp. 3–6, 1940.
[13] R. Bhatia, Matrix Analysis. Vol. 169 of Graduate Texts in Mathematics,
New York, NY: Springer-Verlag, 1997.
[14] R. Blei, “An elementary proof of the grothendieck inequality,” Proceedings of
the American Mathematical Society, vol. 100, no. 1, pp. 58–60, 1987.
[15] B. Bollig, M. Sauerhoff, D. Sieling, and I. Wegener, “On the power of different types of restricted branching programs,” Electronic Colloquium on Computa-
tional Complexity (ECCC), vol. 1, no. 26, 1994.
[16] B. Bollob ́as, Modern Graph Theory. Vol. 184 of Graduate Texts in Mathemat-
ics, New York, NY: Springer-Verlag, 1998.
[17] B. Bollob ́as, Linear Analysis. Cambrdige Mathematical Textbooks.
Cambridge Universty Press, Second ed., 1999.
[18] B.Bollob ́as,RandomGraphs.Vol.73ofCambrdigeStudiesinAdvancedMath-
ematics, Cambridge, United Kingdom: Cambridge University Press, Second
ed., 2001.
[19] J. Bruck and R. Smolensky, “Polynomial threshold functions, AC0 functions
and spectral norms,” Proceedings of the 31st Symposium on Foundations of
Computer Science, pp. 632–641, 1990.
[20] P. Bu ̈rgisser, M. Clausen, and M. A. Shokhrollahi, Algebraic Complexity
Theory. Springer-Verlag, 1997.
[21] P. Bu ̈rgisser and M. Lotz, “Lower bounds on the bounded coefficient
complexity of bilinear maps,” Proceedings of the 43rd IEEE Symposium on
Foundations of Computer Science, pp. 659–668, 2002.
[22] P. Bu ̈rgisser and M. Lotz, “Lower bounds on the bounded coefficient com-
plexity of bilinear maps,” Journal of the ACM, vol. 51, no. 3, pp. 464–482,
2004.
[23] J.-Y.CaiandE.Bach,“Ontestingforzeropolynomialsbyasetofpointswith
bounded precision,” Theoretical Computer Science, vol. 296, no. 1, pp. 15–25,
2003.
[24] B. Chazelle, “A spectral approach to lower bounds,” Proceedings of the 35th
IEEE Symposium on Foundations of Computer Science, pp. 674–682, 1994.
[25] Z. Chen and M. Kao, “Reducing randomness via irrational num- bers,” Proceedings of the 29th Symposium Theory of Computing (STOC),
pp. 200–209, 1997.
[26] R. de Wolf, “Lower bounds on matrix rigidity via a quantum argument,”
in ICALP (1), (M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, eds.),
pp. 62–71, Vol. 4051 of Lecture Notes in Computer Science, Springer, 2006.
[27] P. Erd ̋os, R. Graham, and E. Szemer ́edi, “On sparse graphs with dense long
paths,” Computers and Mathematics with Applications, pp. 365–369, 1976.
[28] J. Forster, “A linear lower bound on the unbounded error probabilistic com- munication complexity,” Journal of Computer and System Sciences, vol. 65,
no. 4, pp. 612–625, Special issue on Complexity, (Chicago, IL, 2001), 2002.

References 151
[29] J.Forster,M.Krause,S.V.Lokam,R.Mubarakzjanov,N.Schmitt,andH.U. Simon, “Relations between communication complexity, linear arrangements, and computational complexity,” in FST TCS 2001: Foundations of Soft- ware Technology and Theoretical Computer Science (Bangalore), pp. 171–182, Vol. 2245 of Lecture Notes in Computer Science, Berlin: Springer, 2001.
[30] J. Forster and H.-U. Simon, “On the smallest possible dimension and the largest possible margin of linear arrangements representing given concept classes,” Theoretical Computer Science, vol. 350, no. 1, pp. 40–48, 2006.
[31] J. Friedman, “A note on matrix rigidity,” Combinatorica, vol. 13, no. 2, pp. 235–239, 1993.
[32] A. G ́al, “A characterization of span program size and improved lower bounds for monotone span programs,” Computational Complexity, vol. 10, no. 4, pp. 277–296, 2001.
[33] G. H. Golub and C. F. Van Loan, Matrix Computations. The Johns Hopkins University Press, Third ed., 1996.
[34] D. Grigoriev, “Using the notions of separability and indepndence for proving lower bounds on the circuit complexity,” Notes of the Leningrad Branch of the Steklov Mathematical Institute, 1976.
[35] A. Hajnal, W. Maass, P. Pudl ́ak, M. Szegedy, and G. Turan, “Threshold func- tions of bounded depth,” Journal of Computer and System Sciences, vol. 46, pp. 129–154, 1993.
[36] B.HalstenbergandR.Reischuk,“Relationsbetweencommunicationcomplex- ity classes,” Journal of Computer and System Sciences, vol. 41, pp. 402–429, 1990.
[37] J. H ̊astad and A. Wigderson, “The randomized communication complexity of set disjointness,” Theory of Computing, vol. 3, no. 1, pp. 211–219, 2007.
[38] A. J. Hoffman and H. W. Wielandt, “The variation of the spectrum of a
normal matrix,” Duke Mathematical Journal, vol. 20, pp. 37–39, 1953.
[39] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press,
February 1990.
[40] M. Karchmer and A. Wigderson, “Monotone circuits for connetivity require
super-logarithmic depth,” SIAM Journal on Discrete Mathematics, vol. 3,
no. 2, pp. 255–265, 1990.
[41] M. Karchmer and A. Wigderson, “On span programs,” in Proceedings of
the 8th Annual Structure in Complexity Theory Conference (San Diego, CA,
1993), pp. 102–111, Los Alamitos, CA, 1993. IEEE Computer Society Press.
[42] B. S. Kashin and A. A. Razborov, “Improved lower bounds on the rigidity of hadamard matrices,” Matematicheskie Zametki, vol. 63, no. 4, pp. 535–540,
English translation at http://www.mi.ras.ru/ razborov/hadamard.ps, 1998.
[43] H. Klauck, “Lower bounds for quantum communication complexity,” SIAM
Journal on Computing, vol. 37, no. 1, pp. 20–46, 2007.
[44] A. R. Klivans and R. A. Servedio, “Learning DNF in time 2O ̃(n1/3),” Journal
of Computer and System Sciences, vol. 68, no. 2, pp. 303–318, 2004.
[45] A. Kotlov, L. Lov ́asz, and S. Vempala, “The Colin de Verdi`ere number and sphere representations of a graph,” Combinatorica, vol. 17, pp. 483–521,
1997.

152 References
[46] M. Krause, “Geometric arguments yield better bounds for threshold circuits and distributed computing,” Theoretical Computer Science, vol. 156, nos. 1&2, pp. 99–117, 1996.
[47] M. Krause and P. Pudl ́ak, “On the computational power of depth 2 circuits with threshold and modulo gates,” STOC, pp. 48–57, 1994.
[48] E. Kushilevitz and N. Nisan, Communication Complexity. Cambridge Univer- sity Press, 1997.
[49] E. Kushilevitz and E. Weinreb, “On the complexity of communication complexity,” STOC 2009, 2009.
[50] S. Lang, Algebra. Addison-Wesley Publishing Company, Third ed., 1993.
[51] D. Lewin and S. Vadhan, “Checking polynomial identities over any field: Towards a derandomization?,” Proceedings of the 30th Symposium on The-
ory of Computing (STOC), pp. 438–447, 1998.
[52] T. Lickteig, Ein elementarer Beweis fu ̈r eine geometrische Gradschranke fu ̈r
die Zahl der Operationen bei der Berechnung von Polynomen. Diplomarbeit,
Universit ̈at Konstanz, 1980.
[53] N. Linial, S. Mendelson, G. Schechtman, and A. Shraibman, “Complexity
measures of sign matrices,” Combinatorica, 2006.
[54] N. Linial and A. Shraibman, “Lower bounds in communication complexity
based on factorization norms,” in STOC ’07: Proceedings of the 39th Annual ACM Symposium on Theory of Computing, pp. 699–708, New York, NY, USA: ACM, 2007.
[55] N. Linial and A. Shraibman, “Learning complexity vs communication com- plexity,” CCC ’08: Conference on Computational Complexity, 2008.
[56] S. V. Lokam, “On the rigidity of Vandermonde matrices,” Theoretical Com- puter Science, vol. 237, nos. 1–2, pp. 477–483, Presented at the DIMACS- DIMATIA workshop on Arithmetic Circuits and Algebraic Methods, June, 1999, 2000.
[57] S. V. Lokam, “Spectral methods for matrix rigidity with applications to size- depth trade-offs and communication complexity,” Journal of Computer and System Sciences, vol. 63, no. 3, pp. 449–473, 2001.
[58] S. V. Lokam, “Graph complexity and slice functions,” Theory of Computing Systems, vol. 36, no. 1, pp. 71–88, 2003.
[59] S. V. Lokam, “Quadratic lower bounds on matrix rigidity,” Proceedings of Theory and Applications of Models of Computation (TAMC), vol. 3959, pp. 295–307, Lecture Notes in Computer Science, 2006.
[60] L. Lov ́asz, “On the ratio of optimal integral and fractional covers,” Discrete Mathematics, vol. 13, no. 4, pp. 383–390, 1975.
[61] L. Lov ́asz, “On the shannon capacity of a graph,” IEEE Transactions on Information Theory, vol. 25, pp. 1–7, 1979.
[62] L.Lov ́aszandM.Saks,“Communicationcomplexityandcombinatoriallattice theory,” Journal of Computer and System Sciences, vol. 47, no. 2, pp. 322–349, 29th Annual IEEE Symposium on Foundations of Computer Science. White Plains, NY, 1988, 1993.
[63] K. Mehlhorn and E. M. Schmidt, “Las Vegas is better than determin- ism in VLSI and distributed computing (Extended Abstract),” in STOC, pp. 330–337, ACM, 1982.

References 153
[64] G. Midrijanis, “Three lines proof of the lower bound for the matrix rigidity,” CoRR, vol. abs/cs/0506081, 2005.
[65] J. Morgenstern, “Note on a lower bound of the linear complexity of the fast Fourier transform,” Journal of the ACM, vol. 20, no. 2, pp. 305–306, 1973.
[66] N. Nisan and M. Szegedy, “On the degree of boolean functions as real poly- nomials,” Computational Complexity, vol. 4, pp. 301–313, 1994.
[67] N. Nisan and A. Wigderson, “Lower bounds on arithmetic circuits via par- tial derivatives,” Proceedings of the 36th IEEE Symposium on Foundations of Computer Science, pp. 16–25, 1995.
[68] N. Nisan and A. Wigderson, “On rank vs communication complexity,” Com- binatorica, vol. 15, no. 4, pp. 557–565, 1995.
[69] N. Nisan and A. Wigderson, “On the complexity of bilinear forms,” Proceed- ings of the 27th ACM Symposium on the Theory of Computing, pp. 723–732, 1995.
[70] R. Paturi and P. Pudl ́ak, “Circuit lower bounds and linear codes,” in Teoria Slozhnosti Vychislenij IX, (E. A. Hirsch, ed.), Vol. 316, pp. 188–204, Notes of Mathematical Seminars of St. Petersburg Department of Steklov Institute of Mathematics, 2004.
[71] R.PaturiandJ.Simon,“Probabilisticcommunicationcomplexity,”Journalof Computer and System Sciences, vol. 33, no. 1, pp. 106–123, 25th Annual Sym- posium on Foundations of Computer Science (Singer Island, Florida, 1984), 1986.
[72] N. Pippenger, “Superconcentrators,” SIAM Journal on Computing, vol. 6, no. 2, pp. 298–304, 1977.
[73] G. P ́olya and G. Szeg ̈o, Problems and Theorems in Analysis, Volume II. New York, NY: Springer-Verlag, 1976.
[74] P. Pudl ́ak, “Large communication in constant depth circuits,” Combinatorica, vol. 14, no. 2, pp. 203–216, 1994.
[75] P. Pudl ́ak, “A note on the use of determinant for proving lower bounds on the size of linear circuits,” Electronic Colloquium on Computational Complexity (ECCC), 1998.
[76] P. Pudl ́ak and V. R ̈odl, “A combinatorial approach to complexity,” Combina- torica, vol. 12, no. 2, pp. 221–226, 1992.
[77] P. Pudl ́ak and V. R ̈odl, “Modified ranks of tensors and the size of cir- cuits,” Proceedings of the 25th ACM Symposium on the Theory of Computing, pp. 523–531, 1993.
[78] P. Pudl ́ak and V. R ̈odl, “Some combinatorial-algebraic problems from com- plexity theory,” Discrete Mathematics, vol. 136, nos. 1–3, pp. 253–279, Trends in discrete mathematics, 1994.
[79] P. Pudl ́ak, V. R ̈odl, and P. Savicky ́, “Graph complexity,” Acta Informatica, vol. 25, no. 5, pp. 515–535, 1988.
[80] P. Pudl ́ak, V. R ̈odl, and J. Sgall, “Boolean circuits, tensor ranks and commu- nication complexity,” Manuscript, March 1994.
[81] P. Pudl ́ak and Z. Vav ̆r ́ın, “Computation of rigidity of order n2/r for one sim- ple matrix,” Commentationes Mathematicae Universitatis Carolinae, vol. 32, no. 2, pp. 213–218, 1991.

154 References
[82] J. Radhakrishnan and A. Ta-Shma, “Bounds for dispersers, extractors, and depth-two superconcentrators,” SIAM Journal on Discrete Mathematics, vol. 13, no. 1, pp. 2–24 (electronic), 2000.
[83] R. Raz, “On the complexity of matrix product,” in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 144–151, ACM Press, 2002.
[84] A. A. Razborov, “On rigid matrices,” Manuscript, (Russian), 1989.
[85] A. A. Razborov, “Applications of matrix methods to the theory of lower bounds in computational complexity,” Combinatorica, vol. 10, no. 1,
pp. 81–93, 1990.
[86] A. A. Razborov, “A note on the use of determinant for proving lower bounds
on the size of linear circuits,” Electronic Colloquium on Computational Com-
plexity (ECCC), 1998.
[87] A. A. Razborov and S. Rudich, “Natural proofs,” Journal of Computer and
System Sciences, vol. 55, no. 1, part 1, pp. 24–35, 26th Annual ACM Sympo-
sium on the Theory of Computing (STOC ’94) (Montreal, PQ, 1994), 1997.
[88] A. A. Razborov and A. A. Sherstov, “The sign-rank of AC0,” FOCS 2008,
Proceedings of Symposium on Foundations of Computer Science, 2008.
[89] L. R ́onyai, L. Babai, and M. K. Ganapathy, “On the number of zero-patterns of a sequence of polynomials,” Journal of the American Mathematical Society,
vol. 14, no. 3, pp. 717–735 (electronic), 2001.
[90] A. A. Sherstov, “The pattern matrix method for lower bounds on quantum
communication,” in STOC, (R. E. Ladner and C. Dwork, eds.), pp. 85–94,
ACM, 2008.
[91] V. Shoup and R. Smolensky, “Lower bounds for polynomial evaluation and
interpolation,” Proceedings of the 32nd IEEE Symposium on Foundations of
Computer Science, pp. 378–383, 1991.
[92] J. W. Silverstein, “The smallest eigenvalue of a large dimensional wishart
matrix,” The Annals of Probability, vol. 13, no. 4, pp. 1364–1368, 1985.
[93] D. A. Spielman, V. Stemann, and M. A. Shokhrollahi, “A remark on matrix
rigidity,” Information Processing Letters, vol. 64, no. 6, pp. 283–285, 1997.
[94] P. Stevenhagen and H. W. Lenstr Jr., “Chebotar ̈ev and his density theorem,”
Mathematical Intelligencer, vol. 18, no. 2, pp. 26–37, 1996.
[95] G. W. Stewart and J.-G. Sun, Matrix Perturbation Theory. Academic Press,
1990.
[96] T. Tao, “An uncertainty principle for cyclic groups of prime order,”
Mathematical Research Letters, vol. 12, pp. 121–127, 2005.
[97] J. Tarui, “Randomized polynomials, threshold circuits and polynomial hier-
archy,” Theoretical Computer Science, vol. 113, pp. 167–183, 1993.
[98] L. Valiant, “Graph–theoretic arguments in low-level complexity,” in Proceed- ings of the 6th Symposium on Mathematical Foundations of Computer Science,
pp. 121–127, 1977. Lecture Notes in Computer Science.
[99] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1999.
ISBN 0-387-98780-0.
[100] K. R. Varadarajan, S. Venkatesh, Y. Ye, and J. Zhang, “Approximating the
radii of point sets,” SIAM Journal on Computing, vol. 36, no. 6, pp. 1764–1776, 2007.

References 155
[101] J. von zur Gathen and J. Gerhard, Modern Computer Algebra. Cambridge University Press, 1999.
[102] H. E. Warren, “Lower bounds for approximation by nonlinear manifolds,” Transactions of the American Mathematical Society, vol. 133, pp. 167–178, 1968.
[103] A. Wigderson, “The fusion method for lower bounds in circuit complexity,” in Combinatorics, Paul Erdo ̋s is Eighty, Vol. 1, pp. 453–468, Budapest: J ́anos Bolyai Mathematical Society, Bolyai Society Mathematical Studies, 1993.
[104] A. C.-C. Yao, “Some complexity questions related to distributive comput- ing,” Proceedings of the 11th ACM Symposium on the Theory of Computing, pp. 209–213, 1979.