Prof. Mark Bun
CAS CS 591 B: Communication Complexity
Lecture Notes 15: Introduction to Lifting
Fall 2019
Reading.
• Rao-Yehudayo, Chapter 8
Today we’ll begin our discussion of lifting theorems. Sometimes you will see these referred to as simulation theorems or hardness escalation theorems. Lifting is a quite general technique which takes lower bounds against a relatively weak, restricted computational model and generically lifts them to apply to a stronger computational model. In the context of communication complexity, a lifting theorem has three parts:
• A query model Cdt used to capture the query complexity of evaluating a boolean function f : {0, 1}n → {0, 1}. Examples include (deterministic, nondeterministic, randomized) decision tree complexity, decision list complexity, and (nonnegative) polynomial degree.
• A two-party gadget g : X × Y → {0, 1}. The gadget should balance two key properties. On one hand, it should be easy to compute using a low communication protocol. On the other hand, it should be hard for the parties to learn the value of g(x,y) without communicating. Some examples of such gadgets include:
The Indexing gadget INDm : {0, 1}m × [m] → {0, 1} dened by INDm(x, y) = xy. Here, think of m = poly(n).
The Inner Product mod 2 gadget IPm : {0, 1}m ×{0, 1}m → {0, 1} dened by IPm(x, y) = ⟨x,y⟩ mod2.Here,thinkofm=O(logn).
The Pattern Matrix gadget PMm : {0, 1}m × ([m] × {0, 1}) dened by PMm(x, (i, b)) = xi ⊕ b. Here, think of m = O(1).
TheXORgadgetXOR:{0,1}×{0,1}→{0,1}.
• A target communication model Ccc in which to prove a lower bound for a composed function
oftheformf◦gn denedby(f◦gn)(x1,…,xn,y1,…,yn)=f(g(x1,y1),…,g(xn,yn)). Here is the template that a (good) lifting theorem follows.
Theorem 1 (Lifting Template). Fix a query complexity measure Cdt, a communication complexity measure Ccc, and a gadget g : X × Y → {0,1}. Then for every function f : {0,1}n → {0,1}, we have
Ccc(f ◦ gn) = Cdt(f) · Θ(Ccc(g)).
1
The upper bound Ccc(f ◦ gn) = Cdt(f) · O(Ccc(g)) is almost always easy. The parties simulate a query algorithm for computing f(z) and every time they need to query a bit zi, they evaluate the gadget g(xi,yi). The interesting direction is the lower bound. Often the lower bound is proved by a simulation argument. Given a communication protocol Π for f ◦ G (where G = gn), we’d like to construct a query algorithm for the function f. On input z ∈ {0,1}n, the query algorithm will attempt to simulate the transcript of Π(x, y) for a random choice of (x, y) ∼ G−1(z) using minimal queries to z. If the simulation is accurate, then accuracy of the protocol Π implies that this query algorithm accurately computes f.
The rst concrete lifting theorem we’ll see is the one from deterministic decision tree complexity to deterministic communication.
1 Decision Trees
A decision tree T is a simple, concrete model for computing a boolean function f : {0, 1}n → {0, 1}. It is represented by a rooted binary tree. Each non-leaf vertex is labeled by a variable xi for i ∈ [n], and its outgoing edges are labeled by 0 and 1. Each leaf is labeled by either 0 or 1. A decision tree induces a determinstic query algorithm for computing a boolean function. Starting at the root, if the value of the variable xi = 0 the algorithm recurses to the 0-child, and if the value of xi = 1 it recurses to the 1-child. The value of the function is the label of the leaf that the algorithm reaches.
A decision tree T computes a boolean function if T(x) = f(x) for every x ∈ {0,1}n. The cost of a decision tree is its depth, or equivalently, the number of input bits that need to be read in a worst-case input.
Denition 2. The deterministic decision tree complexity (or deterministic query complexity) of f : {0, 1}n → {0, 1} is the minimum depth of a decision tree T which computes f . We denote this by Pdt(f).
Example 3. The decision tree complexity of the ORn function is n. Every boolean function has Pdt(f) ≤ n (why?). To see that this is tight for the OR function, let T be a decision tree of depth ≤ n−1. Evaluate T on an input x for which xi = 0 for every index i queried. Then when T reaches a leaf, there is still some index i∗ which has not been queried, but OR(0n) ̸= OR(ei∗), so the decision tree disagrees with OR on at least one input.
A randomized decision tree T is a distribution over deterministic decision trees T . We say that T computes a function f if PrT∼T [T(x) = f(x)] ≥ 2/3 for every x ∈ {0,1}n.
There are a few equivalent ways to dene nondeterministic decision tree complexity NPdt. Perhaps the cleanest way is as the minimum k for which f is computed by a DNF of width k. Here, a DNF is an OR of ANDs of variables and their negations and the width of the DNF is the maximum number of literals appearing in any term.
To see what DNF width has to do with decision trees, let’s start with a natural denition of NPdt in terms of nondeterministic decision trees. A nondeterministic decision tree is a collection T = {T1, . . . , Tm} of deterministic decision trees and we say that it computes f if f(x) = 1 i there exists an i such that Ti(x) = 1. The cost of such a nondeterministic tree is the maximum cost of any of the constituent trees.
Note that any decision tree of depth k is in particular a width-k DNF. This is because we can express it as an OR over all paths from the root of the tree to 1-leaves of the AND of the variables
2
(or their negations) along that path. So a nondeterministic decision tree is an OR over width-k DNF and hence still a width-k DNF.
Conversely, a width-k DNF can be expressed as a nondeterministic decision tree where each constituent tree just computes a single term of the DNF.
(Note that the denition of nondeterministic decision tree cost above does not charge for the
witness size log m. Charging for this as well wouldn’t aect this equivalence by more than a log
factor, however, since a width-k DNF has at most 2k · n terms.) k
2 Deterministic Lifting and an Application
Theorem 4. There exists a gadget g : X × Y → {0, 1} such that for every f : {0, 1}n → {0, 1},
Ccc(f ◦ gn) = Cdt(f) · Θ(Ccc(g)). Here, the gadget g can be taken to be
1. g=INDm form=poly(n),or 2. g = IPm for m = O(log n).
The lifting theorem actually holds for quite general f. It could be a partial function, a search problem (a relation), or a function with non-boolean codomain. We’ll begin the proof of the lifting theorem in the next lecture, but now let’s see an example of what it can be used for.
The 1-partition number of a function F : X × Y → {0, 1} is the minimum number of (non- overlapping) monochromatic rectangles needed to partition F−1(1). Denote this by χ1(F). Every deterministic protocol of cost c partitions its input space into 2c monochromatic rectangles, so Pcc(F ) ≥ log χ1(F ). But we’ve seen that not every partition of X × Y corresponds to an actual deterministic protocol. So could there be examples where this lower bound method is not tight? It turns out that there is a (tight) quadratic separation between deterministic communication com- plexity and partition number.
Theorem 5. There exists a function F : X ×Y → {0,1} such that Pcc(F) ≥ Ω ̃(log2 χ1(F)).
The known constructions of such F are obtained by lifting. As a warmup, let’s show the following relaxation where we replace the 1-partition number with the 1-cover number, which we recall characterizes NPcc communication complexity.
Theorem 6. There exists a function F : X ×Y → {0,1} such that Pcc(F) ≥ Ω ̃((NPcc(F))2).
Proof. We dene a function f : {0, 1}n → {0, 1} which exhibits a quadratic separation between its deterministic query complexity and its non-deterministic query complexity and lift this separation
to the communication world.
The function f interprets its input as a n × √ n matrix and evaluates to 1 i there exists
a column which consists of all 1s. There is an O( n)-cost nondeterminstic query algorithm for √
this problem. Nondeterministically guess the all 1s column and then verify it with O( n) queries. On the other hand, every deterministic query algorithm requires Ω(n) queries. To see this, feed a determinstic query algorithm 1s up until the last query in any given column, and then feed it a 0. Then the algorithm has to query all n bits of the matrix to determine the answer.
√√
3
Now consider the function F = f ◦ gn for either gadget g in Theorem 4. Then by the easy direction of nondeterministic lifting, we have NPcc(F ) = O(√n log n). On the other hand, by the hard direction of deterministic lifting, we have Pcc (F ) = Ω(n log n). This gives the asserted separation.
To prove Theorem 5, we modify the function f so that the lifted function F has a small 1- partition number.
√√
Modied denition of f . Again, f interprets its input as a n × n matrix. However, instead
of each entry being a bit, each entries takes the form (mij , pij ) where pij ∈ [√n]2 ∪ {⊥} is a (possibly null) pointer to another entry of the matrix. The function f evaluates to 1 i there exists a column j for which the following holds:
All bits in column j are 1s. Consider the lexicographically rst non-null pointer in column j, and follow it for k − 1 steps. All of the pointers encountered are non-null, have bits set to 0, and visit every column of the matrix (except for j).
The natural nondeterministic query algorithm computing this function is unambiguous in the sense that for every 1 input, there is exactly one accepting computation path. The unambiguous nondeterministic query algorithm for this problem lifts to an O ̃(√n) 1-partition of F. Meanwhile, the deterministic query complexity of f is still n. To see this, call a query critical if it’s the last in its column. On every non-critical query, answer (1, ⊥). On the rst critical query, answer (0, ⊥). On every subsequent query, answer (0, p) where p points to the last critical query.
4