Prof. Mark Bun
CAS CS 591 B: Communication Complexity
Lecture Notes 5: Discrepancy
Fall 2019
Reading.
• Rao-Yehudayoff Chapter 5
Last time we saw how Yao’s Principle could be used (in principle) to lower bound the random-
ized communication complexity of a function by exhibiting a distribution under which it has high
distributional complexity. Namely, for any function f, we have BPPpub(f) = max Dμ(f). εμε
This lecture will be about a central technique for lower bounding distributional complexity called the discrepancy method. The discrepancy of a collection of sets S1,…,Sm ⊆ X measures how unbalanced any of the sets Si has to be under a red-blue coloring of the elements of X. For reasonable set systems, a random coloring has low discrepancy with high probability, so discrepancy can be thought of as a measure of how random the coloring looks.
To apply this idea to communication complexity, the collection of sets we will be interested in will (intuitively) be the large combinatorial rectangles in X × Y . If this collection has low discrepancy under the coloring induced by the 0’s and 1’s of a function f, that means every large rectangle must be close to balanced with respect to that function. Thus, discrepancy upper bounds can be used to establish communication lower bounds.
Definition 1. Let f : X × Y → {0, 1}, let μ be a distribution over X × Y , and let R ⊆ X × Y be a rectangle. Define
discμ(f, R) = |μ(R ∩ f−1(0)) − μ(R ∩ f−1(1))|, and let the discrepancy of f under μ be
discμ(f) = max discμ(f, R). R a rectangle
Suppose discμ(f,R) is small. Then (intuitively) one of two things must be true: 1. R is small with respect to μ, or
2. R is close to balanced, i.e., far from monochromatic.
Either of these things is good for the lower-bound-prover. If a function has small discrepancy, then that means the input cannot be partitioned into large almost-monochromatic rectangles. We make this idea formal as follows.
Theorem 2. Let f : X × Y → {0, 1} and let μ be a distribution over X × Y . Then
ε 1−2ε
Dμ(f) ≥ log
discμ(f) .
1
Proof. Let Π be a deterministic protocol computing f under μ with cost c. We will show that discμ(f) ≥ (1 − 2ε)2−c,
which implies the theorem.
The leaves of Π partition X × Y into rectangles R1, . . . , Rt for some t ≤ 2c. For each rectangle
i, let ai be the value that Π outputs on rectangle Ri. Since the protocol has error ε, we have 1−ε≤ Pr [Π(x,y)=f(x,y)]
Similarly,
Combining the two inequalities,
(x,y)∼μ t
= Pr [(x,y)∈Ri∧f(x,y)=ai]
(x,y)∼μ
i=1 t
=μ(Ri ∩f−1(ai)). i=1
ε ≥ Pr [Π(x,y) ̸= f(x,y)] (x,y)∼μ
t
=μ(Ri ∩f−1(1−ai)).
i=1
1
t
1−2ε≤μ(Ri ∩f−1(ai))−μ(Ri ∩f−1(1−a1))
i=1 t
≤|μ(Ri ∩f−1(0))−μ(Ri ∩f−1(1))| i=1
t
≤ max discμ(f, R)
R i=1
≤ 2cdiscμ(f).
Now we turn to developing techniques for upper bounding the discrepancy of particular functions.
The Spectral Method
Our first technique for upper bounding discrepancy will be in terms of the analytic properties of the communication matrix of f. It will actually be more convenient to work with the “sign matrix” of f defined by
−1 iff(x,y)=1 Sf[x,y] = (−1)f(x,y) = 1 if f(x,y) = 0.
If f is a symmetric function, i.e., f (x, y) = f (y, x) for every x, y, then the matrix Sf is sym- metric. Let ∥Sf ∥ denote the spectral norm of Sf . When Sf is symmetric, it has a few equivalent characterizations:
2
• ∥Sf ∥ is the absolute value of the largest eigenvalue of Sf .
• ∥Sf ∥ is the largest singular value of Sf . In particular, ∥Sf x∥2 ≤ ∥x∥2 for every vector x. Theorem 3. Let f : {0, 1}n × {0, 1}n → {0, 1} be a symmetric Boolean function, i.e., f (x, y) =
f(y,x) for every x,y, and let μ be the uniform distribution over {0,1}n ×{0,1}n. Then discμ(f, R) ≤ 2−2n∥Sf ∥|R|.
Proof. Let N = 2n. The fact that f is symmetric implies that the real matrix Sf is symmetric, and hence by the spectral theorem, its eigenvectors v1, . . . , vN form an orthonormal basis for RN . Let λi be the associated eigenvalues, i.e., Sf vi = λivi. Let R = A × B and expand the characteristic vectors for A and B in this basis as
NN
1A = αivi, 1B = βivi.
i=1 i=1
Observe that this expansion satisfies
and similarly |B| = ∥β∥2. (This is just Parseval’s identity: The sum of squared coefficients of a vector is the same regardless of what orthonormal basis it’s expanded in.) Then
discμ(f, R) = 2−2n|R ∩ f−1(1)| − 2−2n|R ∩ f−1(0)|
= 2−2n Sf [x, y]
which is why we work with sign matrices
since the vi’s are eigenvectors
by orthonormality by Cauchy-Schwarz
(x,y)∈R
= 2 − 2 n 1 TA S f 1 B
T N
N TN N
|A|=1TA1A= αivi αjvj =αiαjviTvj=∥α∥2, i=1 j=1 i,j=1
N
= 2−2n αivi
βiλivi
i=1 i=1
= 2 − 2 n α i β j λ j v iT v j
i,j N
= 2−2n αj βj λj
j =1 ≤ 2−2n∥Sf ∥∥α∥2∥β∥2
= 2−2n∥Sf ∥|A||B|.
3
1.1 Inner Product
Recall the Inner Product Mod 2 function is defined by
n
IPn(x,y) = ⟨x,y⟩ mod 2 = xiyi mod 2.
i=1
The sign matrix of IPn, which we’ll denote by Hn, is an example of a Hadamard matrix. A Hadamard matrix is a square {±1}-valued matrix whose rows are mutually orthogonal. As an example, H3 takes the form
1 1 1 1 1 1 1 1 1 −1 1 −1 1 −1 1 −1 1 1 −1−1 1 1 −1−1 1 −1 −1 1 1 −1 −1 1
H3=1 1 1 1 −1 −1 −1 −1 1−1 1 −1−1 1 −1 1 1 1 − 1 − 1 − 1 − 1 1 1
1 −1 −1 1 −1 1 1 −1 To get a bound on the spectral norm, we establish the following fact:
Claim 4. Hn2 = 2nI2n , where I2n is the 2n × 2n identity matrix. Proof. We calculate each entry Hn2[x, x′] as
⟨Hn,x, Hn,x′ ⟩ = (−1)⟨x,y⟩(−1)⟨x′,y⟩ y∈{0,1}n
= (−1)⟨x⊕x′,y⟩ y∈{0,1}n
2n if x = x′
=
0 otherwise.
Hence, all of the eigenvalues of Hn have absolute value 2n/2, so ∥Hn∥ = 2n/2. Combining this with Theorem 3 gives us the following bound on the discrepancy:
Lemma 5 (Lindsey’s Lemma). For every rectangle R, we have discμ(IPn, R) ≤ 2−3n/2|R|.
Since every rectangle has size at most 22n, this implies that discμ(IPn) ≤ 2−n/2. Combining this with Theorem 2 establishes that Dμ (IPn) ≥ Ω(n), and hence by Yao’s principle, BPPpub(IPn) ≥
Ω(n).
1/3
1/3
4
2 The BNS / Cauchy-Schwarz Method
This method of upper bounding discrepancy was introduced by Babai, Nisan, and Szegedy. It replaces discrepancy with an upper bound which is often much easier to compute. (No maximum over rectangles and the dependences on x and y are decoupled, at least for product distributions.)
Theorem 6. Let μ be the uniform distribution over X × Y . Then discμ(f)2 ≤Ey,y′ ExSf[x,y]Sf[x,y′]
where x ∼ X, y, y′ ∼ Y are chosen independently. Proof. From the definition of discrepancy,
disc(f, A × B) = 2−2nSf [x, y]. x∈A,y∈B
disc(f,A×B) = |Ex,y1A(x)1B(y)Sf[x,y]|.
This can be rewritten as
Squaring,
disc(f,A×B)2 =(Ex1A(x)Ey1B(y)Sf[x,y])2 ≤ Ex (1A(x)Ey1B(y)Sf[x,y])2
≤ Ex (Ey1B(y)Sf[x,y])2
= Ex Ey,y′1B(y)1B(y′)Sf[x,y]Sf[x,y′] ≤ Ey,y′1B(y)1B(y′)ExSf[x,y]Sf[x,y′] ≤Ey,y′ ExSf[x,y]Sf[x,y′].
by Jensen: E[Z]2 ≤ E[Z2]
The BNS approach gives us another way of proving Lindsey’s Lemma. Note that for every x,y,y′ wehave
0 ify=y′ ExHn[x, y]Hn[x, y′] = 1 otherwise .
Hence,
so again we conclude disc(IPn) ≤ 2−n/2.
disc(IPn, A × B)2 ≤ Ey,y′ ExHn[x, y]Hn[x, y′] = Pr[y = y′] = 2−n 3 A Non-Example: Disjointness
Can we use a discrepancy upper bound to prove a lower bound for the set disjointness function? It turns out that disjointness has rather large discrepancy. Here, the discrepancy of a function is the minimum discrepancy over all distributions on the inputs:
5
Definition 7. The discrepancy of a function f : X × Y → {0, 1} is disc(f) = mindiscμ(f).
μ
Note that by combining Yao’s principle with Theorem 2, we have the following formulation of the discrepancy lower bound:
Theorem 8. For every 0 < δ < 1/2, we have
pub 2δ
BPP1/2−δ(f) ≥ log disc(f) = log(1/disc(f)) − log(1/2δ).
This formulation reveals that discrepancy not only gives good lower bounds on the bounded
error randomized communication complexity of f, but also for protocols where the advantage δ over
random guessing is as small as, say, disc(f)/2. In particular, BPPpub (IPn) ≥ Ω(n) even for 1/2−δ
δ = 2−Ω(n). But this strength of the discrepancy method can also be its weakness. If a function does admit a good large error protocol, then the discrepancy method cannot even prove strong lower bounds even for bounded error.
We can use this connection to prove a lower bound on the discrepancy of Disjointness by ex- hibiting a good large-error communication protocol.
Theorem 9. disc(DISJn) ≥ Ω(1/n).
Proof. We exhibit a public coin randomized protocol for disjointness DISJn with advantage δ = 1/4n
and communication cost c = 2. By Theorem 8 this will imply disc(f)≥2δ·2−c ≥ 1 .
8n
The protocol is as follows. Using shared randomness, Alice and Bob select a random index
i ∈ [n]. Alice sends the bit xi to Bob. If xi = yi = 1, he reports DISJn(x,y) = 0. Otherwise, he
reports 0 with probability 1 − 1 and 1 with probability 1 + 1 . 2 4n 2 4n
To see that this protocol succeeds, first suppose DISJn(x,y) = 1. Then Alice and Bob always find themselves in the second case, and their advantage over random guessing is 1/(4n). Otherwise, suppose DISJn(x,y) = 0. Then with probability at least 1/n, they select an index i for which xi = yi = 1, and hence their success probability is at least
1 11 1 1 1 n+ 1−n 2−4n ≥2+4n.
Hence, even for bounded error protocols, the discrepancy method can, at best, only prove an Ω(log n) lower bound for Disjointness. (Exercise: Show that disc(DISJn) ≤ O(1/√n), so this weak lower bound is actually attainable.) Nevertheless, there are techniques related to the discrepancy method, e.g., the one-sided discrepancy or corruption bound, which are capable of proving a tight lower bound of Ω(n) for Disjointness. This method is explored in one of your homework problems.
6
4 Discrepancy and PPcc
Let’s take a closer look at the relationship between discrepancy and large-error communication. It turns out that discrepancy actually gives a tight characterization of the cost of large-error protocols when one measures cost in the right way.
Definition 10. Let f : X × Y → {0, 1}. A public coin randomized protocol Π computes f with advantage δ > 0 if for every (x, y) ∈ X × Y ,
Pr[Π(x, y) = f (x, y)] ≥ 1 + δ. 2
The PPcc cost of a protocol Π computing f with length c and advantage δ is defined to be PPcc(Π) = c + log(1/δ).
Moreover, PPcc(f) is defined to be the least PPcc cost of a protocol computing f, and PPcc is the class of sequences of functions with polylogarithmic PPcc cost.
Theorem 11. For every function f : X × Y → {0, 1}, we have PPcc(f) = Θ(log(1/disc(f))).
Proof. For the ≥ direction, we want to show that any protocol Π computing f has PPcc(Π) ≥ Ω(log(1/disc(f))). Let Π be a protocol computing f with length c and advantage δ. If δ ≤ disc(f)/2, we’re done. Otherwise, by Theorem 8, we have
c ≥ log(1/disc(f )) − log(1/2δ) ≥ 1 log(1/disc(f )), 2
and hence PPcc(Π) ≥ Ω(log(1/disc(f))).
For the ≤ direction, suppose f has large discrepancy disc(f) ≥ 2−c. We will exhibit a good
distributional protocol for f with respect to any given μ which, by Yao’s principle, implies the
existence of a good public coin protocol. For any distribution μ we have discμ(f) ≥ 2−c, and
hence there exists a rectangle R = A × B such that discμ(f, R) ≥ 2−c. Let a ∈ {0, 1} such that
μ(R∩f−1(a)) ≥ 2−c +μ(R∩f−1(1−a)). This implies that μ(R∩f−1(a)) ≥ 1μ(R)+2−c−1. The 2
protocol is as follows. Using 2 bits of communication, Alice and Bob check if (x, y) ∈ R. If so, they output a, and otherwise output the outcome of a coin flip. The success probability of this protocol is
Pr[Π(x, y) = f(x, y)] = μ(R ∩ f−1(a)) + 1(1 − μ(R)) 2
≥ 1μ(R) + 2−c−1 + 1 − 1μ(R) 222
= 1 + 2−c−1. 2
Hence, PPcc(f) ≤ 2 + (c + 1) = O(log(1/disc(f))).
7