Quantum Programming Implementation: Compilers: Register Allocation
Mar 1, 2022
Quantum Programming, by Outline Implementation: Compilers: Register Allocation; 100 minutes; Mar 1, 2022
Copyright By PowCoder代写 加微信 powcoder
Hook: Quantum compilers enable us to run programs on quantum computers. Optimizing quan- tum compilers go further and help us maximize reliability and fight decoherence. Those aspects are crucial because we want our results to be meaningful and we want them before quantumness falls apart.
Purpose: Persuade you that optimizing quantum compilers are nontrivial.
1. We will define a few optimization problems that quantum compilers face. 2. We can go for an optimal solution for programs with few qubits.
3. We will see what optimizing quantum compilers do in practice.
Transition to Body: Let us begin with a look at some problems that are hard to solve.
Main Point 1: We will define a few optimization problems that quantum compilers face. [Quantum register allocation is NP-complete]
[Qubit swapping is NP-complete]
[Depth minimization is NP-complete]
Transition to MP2: Those hardness results guarantee full employment for compiler writers.
Main Point 2: We can go for an optimal solution for programs with few qubits. [We can formulate an optimization problem as a constraint problem]
[We can use a constraint solver to solve the constraints]
[Going for an optimal solution won’t scale]
Transition to MP3: Compilers must scale to large quantum computers.
Main Point 3: We will see what optimizing quantum compilers do in practice. [Compilers use approximate solutions]
[We can construct hard cases]
[Current compilers do poorly on hard cases]
Transition to Close: So there you have it.
Review: Quantum compilers must optimize to maximize reliability and fight decoherence, and key optimization problems are NP-complete. In practice, compilers use polynomial-time approxi- mation algorithms.
Strong finish: When we scale to 100+ qubits, we also need to scale our optimizing quantum compilers to avoid that compile time becomes a bottleneck.
Call to action: Now is a great time to do research on optimizing quantum compilers.
Detailed presentation
Hook: Quantum compilers enable us to run programs on quantum computers. Optimizing quan- tum compilers go further and help us maximize reliability and fight decoherence. Those aspects are crucial because we want our results to be meaningful and we want them before quantumness falls apart.
Purpose: Persuade you that optimizing quantum compilers are nontrivial.
1. We will define a few optimization problems that quantum compilers face. 2. We can go for an optimal solution for programs with few qubits.
3. We will see what optimizing quantum compilers do in practice.
Transition to Body: Let us begin with a look at some problems that are hard to solve. Main Point 1: We will define a few optimization problems that quantum compilers face.
[Quantum register allocation is NP-complete]
Let us review the tasks of a classical compiler and a quantum compiler.
Phase Front end
Middle end Back end
Classical compiler
Build program representation (Type check)
Instruction selection
Register allocation & spilling Peephole optimization
Quantum compiler
Build program representation
⟨No type check?!⟩
⟨Too early to optimize?!?!⟩ Instruction selection
Register allocation & qubit swapping Peephole optimization
The quantum register allocation problem is to create a register allocation, which is a map from source-level qubit variables to target-level qubit registers. What do we need this map to satisfy? Suppose that after gate selection, we need to implement CZ(i,j). Ideally we want a register allo- cation φ such that qubit variable i gets placed in qubit register φ(i), qubit variable j gets placed in qubit register φ(j), and the quantum computer supports the operation CZ(φ(i),φ(j)). Is this possible for all source programs? The answer is No, which we can tell from examining a natural implementation of Grover’s algorithm for three qubits. An example of such an implementation uses all of CZ(0,1), CZ(1,2), CZ(0,2). So, if we compile to the Rigetti Agave (see below), any φ will result in that at least one CZ operation has no support.
Notice that sometimes the answer is Yes, as we can tell from examining a natural implementation of Grover’s algorithm for two qubits. An example of such an implementation uses a single form of two-qubit operation, namely CZ(0,1). So, we can easily find a register allocation φ such that CZ(φ(0), φ(1)) is supported. For example, we can let φ(0) = 0 and φ(1) = 1 because in the Rigetti Agave, qubit registers 0,1 are connected.
The question we are asking here can be formalized as the following decision problem about undirected graphs.
Problem: Quantum Register Allocation.
Instance: Two undirected graphs G1 = (V1, E1) and G2 = (V2, E2). Question: ∃φ:V1 →V2 : ∀i,j∈V1 : [(i,j)∈E1]⇒[(φ(i),φ(j))∈E2]?
The idea is that G1 represents the two-qubit operations in the source program, while G2 repre- sents the two-qubit operations that the target level supports. For example, if the source program contains CZ(0, 1), CZ(1, 2), CZ(0, 2), and the target level is the Rigetti Agave, then
G1 = ({0, 1, 2}, {(0, 1), (0, 2), (1, 2)})
G2 = ({0,1,2,3,4,5,6,7},{(0,1),(1,2),(2,3),(3,4),(5,6),(6,7),(0,7)})
Here the quantum register allocation problem is to decide whether we can find φ : {0,1,2} → {0,1,2,3,4,5,6,7} such that φ maps (0,1) to an edge in G2, and φ maps (1,2) to an edge in G2, and φ maps (0,2) to an edge in G2.
The paper Qubit Allocation, by Siraichi, dos Santos, Collange, and Pereira, in CGO 2018, noted that Quantum Register Allocation is NP-complete. The reason is that Quantum Register Allocation is exactly the same as Subgraph Isomorphism, which is NP-complete.
We might argue that the subgraph isomorphism problem is super general and that in practice we don’t need that much generality for quantum computing. Let us look at two restricted cases.
First, let us consider a restricted case of source programs. Let us claim that our programs connect lots of qubits to lots of other qubits so in practice we could as well think of G1 as a complete subgraph. Perhaps the subgraph isomorphism problem is easier in this case? No! When G1 is a complete subgraph, the subgraph isomorphism problem becomes the clique problem, which is NP-complete.
Second, let us consider a restricted case of quantum computers. Most quantum computers today have planar coupling graphs so let us assume that G2 is a planar graphs. Perhaps the subgraph isomorphism problem is easier in this case? No! We will need G1 to be planar, which we can check in linear time. However, even when we know that G1 and G2 are planar graphs, the subgraph isomorphism problem is NP-complete.
Quantum Register Allocation has a sister problem for directed graphs that is useful for modeling CNOT.
[Qubit swapping is NP-complete]
In practice we always need a register allocation so rather than insisting on getting a perfect one, we will settle for a good one. This situation is akin to a classical compiler that always must produce a register allocation, even if some variables are placed in memory.
One way to evaluate the quality of a register allocation is to count the number of swaps that we have to do. A good register allocation minimizes the number of qubit swaps, which we can formalize as follows.
Problem: Instance:
Swap Minimization.
Two undirected graphs G1 = (V1, E1) and G2 = (V2, E2),
a multiplicity mapping M : (V1 × V1) → Nat, and an integer K. ∃φ : V1 → V2 : Σ(i,j)∈E1
M(i, j) × (2 × (length(shortestPathG2 (φ(i), φ(j))) − 1)) ≤ K ?
This formalization uses M to model that some instructions may be used more than once.
The quantity (2 × (length(shortestPathG2 (φ(i), φ(j))) − 1)) models the number of swaps that we have to do for the case (i, j) ∈ E1. In particular, we need to multiply by 2 because we have to swap both to bring the qubits next to other and to bring them back. For example, for G2 above,
suppose φ(i) = 0 and φ(j) = 2, which means that
2 × (length(shortestPathG2 (φ(i), φ(j))) − 1) = 2×(2−1)
We can easily reduce Quantum Register Allocation to Swap Minimization, so since Quantum Register Allocation is NP-complete, also Swap Minimization is NP-complete. The reduction maps an instance (G1, G2) of Quantum Register Allocation to (G1, G2, 0). Thus, we are asking whether there exists φ such that, for every edge (i,j) in G1, the shortest path in G2 between φ(i) and φ(j) is a single edge.
[Depth minimization is NP-complete]
Even if we need qubit swaps, we may be able to execute some of the swaps in parallel with each other and with other operations. This shifts our attention from minimizing the number of swaps to minimizing the depth of the circuit after register allocation and qubit swapping. The depth of a circuit is the number of time steps, also known as moments.
Problem: Instance:
Depth Minimization.
Two undirected graphs G1 = (V1, E1) and G2 = (V2, E2),
a list of edges e1,…en ∈ E1, and an integer K.
Can we generate swaps and schedule some operations to run in parallel such that the total number of time steps is at most K ?
Above we stated the question of Depth Minimization informally. The idea is that the list of edges e1,…,en represents the source program, and that K is an upper bound on the execution time. The question is whether we can get everything done in K time steps.
The paper Optimality Study of Existing Quantum Computing Layout Synthesis Tools, by Bochen Tan and (both UCLA), published in 2020, states that Depth Minimization is NP- complete. The proof is by reduction from Hamiltonian Cycle, which is NP-complete.
Transition to MP2: Those hardness results guarantee full employment for compiler writers. Main Point 2: We can go for an optimal solution for programs with few qubits.
[We can formulate an optimization problem as a constraint problem]
Let us consider how to use an Integer Linear Programming (ILP) solver to solve Swap Minimization. From an instance of Swap Minimization, that is, two graphs G1 = (V1,E1) and G2 = (V2,E2), a multiplicity mapping M : (V1 × V1) → Nat, and an integer K, we generate integer variables
xik foreachi∈V1 andk∈V2
The idea is that when we assign xik the value 1, we represent φ(i) = k. Then we generate the
constraints
xik ≥0 xik =1
foreachi∈V1 andforeachk∈V2 foreachi∈V1
We can use any minimal solution ψ to define φ : V1 → V2 as follows:
φ(i)=k ⇐⇒ ψ(xik)=1
The objective is to minimize:
The Total Number of Swaps
foreachk∈V2
M (i, j) × (2 × (length(shortestPathG2 (k, m)) − 1)) × xik × xjm
Given ψ, we can check whether we produced a sum that is less than or equal to K.
i,j∈V1 ∧ k,m∈V2
For example, consider:
We have these variables:
G1 = ({0, 1, 2}, {(0, 1), (0, 2), (1, 2)})
G2 = ({0, 1, 2}, {(0, 1), (1, 2)})
M = (0,1)→1, (0,2)→2, (1,2)→3 K=6
x00, x01, x02, x10, x11, x12, x20, x21, x22 We generate these constraints:
x00 ≥ 0 x10 ≥ 0 x20 ≥ 0
x00+x01+x02 = 1 x10+x11+x12 = 1 x20+x21+x22 = 1
x00+x10+x20 ≤ 1 x01+x11+x21 ≤ 1 x02+x12+x22 ≤ 1
The objective is to minimize:
∧ x01 ≥ 0 ∧ x02 ≥ 0 ∧ ∧ x11 ≥ 0 ∧ x12 ≥ 0 ∧ ∧ x21 ≥ 0 ∧ x22 ≥ 0
M (0, 2) × M (1, 2) ×
k,m∈V2 ∑
M (i, j) × length(shortestPathG2 (k, m)) × xik × xjm
i,j∈V1 ∧ k,m∈V2 = M(0,1)×
(2 × (length(shortestPathG2 (k, m)) − 1)) × x0k × x1m +
(2 × (length(shortestPathG2 (k, m)) − 1)) × x0k × x2m +
k,m∈V2 ∑
(2 × (length(shortestPathG2 (k, m)) − 1)) × x1k × x2m
Notice that each of the three sums above has 3 × 2 items, and that in each sum, exactly one of those items will be nonzero.
For example, let us consider
φ = [0→0, 1→1, 2→2]
Now the above sum becomes:
The Total Number of Swaps
= M (0, 1) × (2 × (length(shortestPathG2 (0, 1)) − 1)) × 1 × 1 +
M(0,2)×(2×(length(shortestPathG2(0,2))−1))×1×1 +
M (1, 2) × (2 × (length(shortestPathG2 (1, 2)) − 1)) × 1 × 1
= 1 × (2 × (length(shortestPathG2 (0, 1)) − 1)) × 1 × 1 + 2×(2×(length(shortestPathG2(0,2))−1))×1×1 +
3 × (2 × (length(shortestPathG2 (1, 2)) − 1)) × 1 × 1
= 1×(2×(1−1))×1×1 +
2×(2×(2−1))×1×1 +
3 × (2 × (1 − 1)) × 1 × 1
= 1×(2×0)×1×1 + 2×(2×1)×1×1 +
Thus, we have a register allocation that leads to 4 swaps. We were going for K = 6, so the mapping
φ = [0→0, 1→1, 2→2]
[We can use a constraint solver to solve the constraints]
Notice that both the number of variables and the number of constraints are polynomial in the size of the program and in the size of the target. If we give the constraints to an ILP-solver, then it will use worst-case exponential time to solve the constraints. This is as expected because Swap Minimization is NP-complete.
Some ILP-solvers work iteratively and produce better and better solutions as they go along.
[Going for an optimal solution won’t scale]
In our ILP formulation, we have on the order of n2 integer variables where n is the number of qubits. Thus, 300 qubits means 90,000 integer variables.
One of the best ILP solvers is CPLEX, which has a limit of 30 million variables. For some kinds of problems, CPLEX does well on instances with many variables. But for the problems faced by quantum compilers, so far nobody has noticed any kind of scalability.
Transition to MP3: Let us now turn to the last stage of the compiler.
Main Point 3: We will see what optimizing quantum compilers do in practice.
[Compilers use approximate solutions]
When we scale to many qubits, the use of ILP-solvers becomes infeasible and instead compilers use algorithms that produce approximate solutions.
How can we make one of those compiler problems tractable while tolerating approximate solu- tions? A standard approach is to divide the circuit into parts that each can be solved optimally. This comes at the cost of inserting swaps at the boundary of the parts.
[We can construct hard cases]
The paper Optimality Study of Existing Quantum Computing Layout Synthesis Tools, by Bochen Tan and (both UCLA), published in 2020, showed how to construct quantum circuits for which Depth Minimization is hard but for which optimal solutions are known by construction. They have a benchmark constructor with the following functionality.
Algorithm: Input:
Benchmark constructor.
A graph G that represents the qubit connectivity in the target,
an integer T that represents the target depth,
an integer d1 that represents the target density of one-qubit gates, and
an integer d2 that represents the target density of two-qubit gates.
A circuit in QASM format for which solving Depth Minimization optimally will produce a circuit with depth T and densities d1,d2.
The benchmark constructor works in three steps, as follows.
1. Backbone: construct a dependency chain of T gates.
2. Sprinkling: add gates randomly.
3. Scrambling: generate a random mapping from physical qubits to logical qubits.
This produces a benchmark with optimal depth T and for which a register allocator must
unscramble Step 3 to produce an optimal solution.
Tan and Cong produced used their tool to construct benchmarks that are publicly available at https://github.com/UCLA-VAST/QUEKO-benchmark. Those benchmarks fall into three groups:
• BNTF: circuits of depth 5, 10, …, 45; some constructed for Rigetti Aspen-4 and densities (0.27, 0.36), and some constructed for Google Sycamore and densities (0.51, 0.40).
• BSS: circuits of depth 100, 200, …, 900, constructed for four different quantum computers and both pairs of densities listed above.
• BIGD: circuits of depth 45, constructed for IBM Tokyo and varying densities. [Current compilers do poorly on hard cases]
Tan and Cong tested several compilers on their benchmarks. In some cases, the operating system ran out of memory and killed the compiler.
Benchmark group BNTF
Optimality gap
Computer Qiskit Cirq
As we can see, Depth Minimization needs more work.
Transition to Close: So there you have it.
Rigetti Aspen-4 (16 qubits) Google Sycamore (54 qubits) Google Sycamore (54 qubits) IBM Tokyo (20 qubits)
5x 10x 11–14x 35–50x 10–13x ? 2–6x ?
Review: Quantum compilers must optimize to maximize reliability and fight decoherence, and key optimization problems are NP-complete. In practice, compilers use polynomial-time approxi- mation algorithms.
Strong finish: When we scale to 100+ qubits, we also need to scale our optimizing quantum compilers to avoid that compile time becomes a bottleneck.
Call to action: Now is a great time to do research on optimizing quantum compilers.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com