CS代写 Quantum Programming Foundations: Classical Error Correction

Quantum Programming Foundations: Classical Error Correction
Feb 15, 2022

Quantum Programming, by Outline Foundations: classical error correction; 100 minutes; Feb 15, 2022

Copyright By PowCoder代写 加微信 powcoder

Hook: We compute and we communicate but our devices and our communication channels are imperfect and they corrupt our data. How can we detect and correct the errors?
Purpose: Persuade you that a lot of classical error correction is applied linear algebra.
1. We will introduce the idea and the history of error correction.
2. We will show how encoding and detection can be linear functions. 3. We will detail the reasons why linear codes work well.
Transition to Body: What is an error and why can we hope to correct it?
Main Point 1: We will introduce the idea and the history of error correction. [An error is a bit flip]
[We need to add redundancy]
[People have studied error correction since 1948]
Transition to MP2: How do we define encoding and detection?
Main Point 2: We will show how encoding and detection can be linear functions. [Only some errors can be corrected]
[We can use a linear function to encode, that is, to add redundancy]
[We can use a different linear function to detect errors]
Transition to MP3: But does it really work well?
Main Point 3: We will detail the reasons why linear codes work well.
[Let us define a distance between bitstrings]
[We need the codewords to have some distance between them]
[The bigger the distance between codewords, the more errors we can correct]
Transition to Close: So there you have it.
Review: We can use a linear function to map data to codewords and then use a different linear
function to detect errors. This works well when we have some distance between the codewords.
Strong finish: Linear codes have been used in classical computers, storage, and communication, and now the stage is set for using them in quantum computing, too. But that comes with the challenge known as the no-cloning theorem, which says that we cannot copy a quantum state, so care must be taken.
Call to action: Master classical error correction in the form of linear codes and thereby get ready for quantum error correction.

Detailed presentation
Hook: We compute and we communicate but our devices and our communication channels are noisy and they corrupt our data. How can we detect and correct the errors?
Purpose: Persuade you that a lot of classical error correction is applied linear algebra.
1. We will introduce the idea and the history of error correction.
2. We will show how encoding and detection can be linear functions. 3. We will detail the reasons why linear codes work well.
Transition to Body: What is an error and why can we hope to correct it? Main Point 1: We will introduce the idea and the history of error correction.
[An error is a bit flip]
Aunitofdataisabitandanerrorisabitflip. So,aunitofdataisanelementofB={0,1},and an error flips 0 to 1, or flips 1 to 0.
Let’s compare two computers: one classical and one quantum.
The classical computer in our comparison is Intel Xeon Phi Processor, also known as “Knight’s Corner”. It is a commonly used processor in supercomputing clusters. This processor has a clock frequency of 1 GHz. In 2017, researchers measured the “soft error rate”, which are radiation-induced errors that can flip states. The got the soft error rate to be 1 error per 1 thousand years.
In contrast, the IBM Q16 Rueschlikon is a quantum computer. It clocks at 5.26 GHz. The following statistics is from the IBM calibration data page in October 2018. Its single-qubit gate error is 2 errors for every 1,000 operations, its multi-qubit gate error 44 errors for every 1,000 operations, and its read out error is 60 errors for every 1,000 measurements. All of those error rates are orders of magnitude more frequent than the error rate of the Knight’s Corner.
[We need to add redundancy]
We need to add redundancy such that even in the presence of errors, we can recover the data. In the spirit of the pandemic, perhaps one can say that we want to immunize the data. In this case, we want to immunize the data against noise. Our goal is that adding redundancy is better than sending raw data.
Suppose you are going to email somebody that you are willing to lend them $100. The email service is noisy and sometimes changes $100 to $1,000,000, and sometimes changes $1,000,000 to $100. You don’t want to be on the hook for lending out $1,000,000. What do you do? Do you send the email and hope the best? Or, do you send the email twice and tell the person that if the two emails disagree, then ask you to send two more emails? Or, do you send the email three times and tell the person that if the emails are not all the same, then go with the one that came twice?
Idea: spread the information out.
Straightforward idea to get this done: replicate the data. Let us try two copies of each bit:
0 → 00 1 → 11

If an error occurs to one of the two bits, we will get 01 or 10. We know that such a pair should never occur so if we see it, we know that an error has happened, but not to which bit. Notice that we can think of this in terms of parity: if we see two bits that sum up to zero (even parity), no error has happened, but if we see two buts that sum up to one (odd parity), an error has happened. So the encoding scheme above is good a detecting single-bit errors but it cannot correct an error. This is like talking over a bad telephone line: please could you repeat that? We can detect that the information wasn’t understandable and we need to hear it again before we can understand.
If an error occurs to both bits, we have that 00 changes to 11 and that 11 changes to 00, so we won’t know that errors have happened.
In the spirit of the pandemic, the idea is to give the data a single-shot vaccine. After the shot, the data will be fine and is ready to travel all over the World.
Let us try three copies of each bit:
0 → 000 1 → 111
Both this scheme and the previous scheme are examples of repetition codes. In the case of three copies, we can do majority voting to recover from a single bitflip:
000, 100, 010, 001 → 0 110, 011, 101, 111 → 1
We can understand the above in terms of parities: the parity of the first and second bit and the parity of the second and third bit. Now we have the following so-called error syndromes (also known as parity checks):
codeword 000 or 111 100 or 011 010 or 101 001 or 110
error syndrome 00
conclusion
so, no errror
so, bit 1 flipped so, bit 2 flipped so, bit 3 flipped
So, if we know the error syndrome, we know which bit flipped (if any) and we can flip it back (if needed).

[People have studied error correction since 1948]
Some highlights of the early history of error correction follow.
1948: Shannon wrote about the mathematics behind communication.
1949: Golay code [Voyager 1,2 used Golay code when they visited Jupiter+Saturn in 1979–1981]. 1950: Hamming code.
1954: Reed-Muller codes.
1955: Convolutional codes.
1957: Cyclic codes.
1959: BCH codes, named after Bose, Chaudhuri, and Hocquenghem.
1960: Reed-Solomon codes.
Many uses of classical error correction, including: 1) parity-check code for memory chips, 2) cyclic redundancy code for packets or cells in networks, 3) noise margins in Volts for 0,1 in digital circuits, and 4) Reed-Solomon codes for satellites.
Transition to MP2: How do we define encoding and detection?
Main Point 2: We will show how encoding and detection can be linear functions.
[Only some errors can be corrected]
The idea of using linear functions for encoding and detection was one of the first ideas that people got and it turned out to versatile. As a result, we talk about linear codes as a term for any code that uses linear functions for encoding and detection. All the codes mentioned above are linear codes. While there are other codes than linear codes, people are excited about linear codes for designing quantum error correction so today we will focus on linear codes. No surprise there: quantum computing is applied linear algebra so linear codes are a great fit.
No code can correct every possible error. For example, for the repetition code that repeats three times, if two bits flip, the correction step will work incorrectly.
error correction
0 −→ encoding
000 −→ 011 −→ 111 error correction
−→ 1 decoding
0 −→ encoding
000 −→ 101 −→ 111 error correction
−→ 1 decoding
Even worse, if three bits flip, the error is not detectable:
000 −→ 110 −→ 111
000 → 111 111 → 000
The goal is to correct the most probable errors. The idea of focusing on single bit flips is that if the probability of a single bit flip is low, then the probability of two bit flips “close to each other” is super low.

[We can use a linear function to encode, that is, to add redundancy]
We will think of B = {0,1} as a field, also known as GF(2), where addition + is (mod 2), also known as XOR. Additionally, we will think of Bk as a vector space.
We can represent 0,1 as 1 × 1 vectors (0) and (1), and we can represent 000 and 111 in the form of column vectors
01 0 1
Suppose we have k bits and we want to add some redundancy that brings the total to n bits,
where k < n. Thus, the encoding embeds Bk into Bn: encoding : Bk → Bn People say that such an encoding function creates an [n, k] code. Suppose the encoding function is an injective, linear function that maps a vector of size k to a vector of size n. We can represent such an encoding function as an n × k matrix G. People call G a generator matrix. Notice that the n-bit all-zero vector 0 is always a codeword. In our example, we have Now we get 10 G(0) = 1(0) = 0 11 G(1) = 1(1) = 1 Thus, here we have a [3, 1] code. [We can use a different linear function to detect errors] How do we detect an error? Let G(Bk) be the k-dimensional subspace of Bn that is the image of Bk under G. People call G(Bk) the code space. Notice that G(Bk) is the span of the columns of G. Thus, each encoded data is in G(Bk) and can be expressed as a linear combination of the columns of G. Let P be a matrix of maximal rank whose rows span Bn \G(Bk), which is a (n−k)-dimensional subspace of Bn. Thus, P is an ((n − k) × n) matrix with rank n − k. People call P a parity-check matrix. A key property of P is that PG = 0. For our example, we can pick: Noticethatifv∈Bk,wehaveP Gv=0,where0isavectoroflengthn−kinwhicheveryentry is 0. Indeed, this is an if-and-only-if, as expressed by the following theorem. Theorem: ∀s∈Bn: P s=0iffs∈G(Bk). We can check this for each element of G(B1) = { 000, 111 }. 0 (1×0+1×0+0×0) (0) P 0  = 0×0+1×0+1×0 = 0 1 (1×1+1×1+0×1) (0) P 1  = 0×1+1×1+1×1 = 0 1 For a codeword s ∈ G(Bk) we can model doing some bit flips in s by adding an error vector e to s: For example, the three single-bit errors can be represented by these error vectors: e1 =0 e2 =1 e3 =0 001 We can represent different bit patterns as a cube: Now let us apply the parity-check matrix to s′: Ps′ = P(s+e) = Ps+Pe = 0+Pe = Pe The value Pe is called the syndrome of error e. So, P annihilates the codeword and returns an error syndrome. Let us calculate P e where e represents one of the single-bit errors: (1 1 0)1 (1) Pe1= 0110= 0 (1 1 0)0 (1) Pe2= 0111= 1 0 (1 1 0)0 (0) Pe3= 0110= 1 Those are the error syndromes that we saw earlier! Each row of P represents a parity check, and together the rows lead to the error syndrome. If Pe is unique for every possible error e, we can determine and fix every error. Specifically, notice that P is injective on the errors so there exists a function Q such that for every error e, we have Q(P e) = e. The idea of Q is to map each error syndrome to the corresponding error. So, for our example, we have: (1) 1 (1) 0 (0) 0 Q0 =0 Q1 =1 Q1 =1 Now observe that: received+fix = s′ +Q(Ps′) = s′ +Q(Pe) = s′ +e = (s+e)+e = s Indeed, this is an if-and-only-if, as expressed by the second of following two theorems. Theorem: Error detection is possible iff every error has a nonzero syndrome. Theorem: Error correction is possible iff every error has a unique syndrome. The first theorem can be restated as saying that error detection is possible iff every error doesn’t change one codeword into a different codeword. Let us look at how we can formalize the two theorems. Consider a code with parity-check matrix P and a set of errors e1, . . . , em. The first theorem says that we can do error detection if every P ej ̸= 0. The second theorem says that we can do error correction if all the P ej are distinct. Transition to MP3: But does it really work well? Main Point 3: We will detail the reasons why linear codes work well. [Let us define a distance between bitstrings] We will define a metric on Bn, called the Hamming distance after . Let w be a function from codewords to natural numbers: w : Bn→N The idea is that w assigns a weight ws to every s ∈ Bn. Specifically, the weight ws is the number of nonzero entries in s. For s, t ∈ Bn, we define the Hamming distance d(s, t) as follows: d(s,t) = w(s−t) = w(s+t) Notice that w(s − t) = w(s + t) because we work with bits. So, w(s, t) counts the number of positions at which s and t differ. Or, the Hamming distance between two codewords is the minimum number of bits that must be flipped to convert one to the other. Notice that d is a metric because it satisfies: d(s,t)=0 ⇐⇒ s=t d(s, t) = d(t, s) symmetry d(s, t) ≤ d(s, x) + d(x, t) triangle inequality [We need the codewords to have some distance between them] The minimum Hamming distance of a code is the minimum distance between any two codewords: d(G) = min{d(s,t)|s,t∈G(Bk) ∧ s̸=t} Notice that G(2k) is closed under vector addition so d(G) = min{d(s,t)|s,t∈G(Bk) ∧ s̸=t} = min{w(s+t)|s,t∈G(Bk) ∧ s̸=t} = min{w(z)|z∈G(Bk) ∧ z̸=0} Let us again consider our example: replication such that we have three copies of each bit. 0 → 000 1 → 111 Here k = 1 and n = 3 and, as we have seen before,    G = 1 G(0) = 1(0) = 0 G(1) = 1(1) = 1 We see that we represent 0,1 as 1×1 vectors, and we get 000 and 111 in the form of column vectors. We see that min{w(z)|z∈G(Bk) ∧ z̸=0} = min{w(z)|z∈{000,111} ∧ z̸=0} = min{w(z)|z=111} = min { w(111) } = w(111) =3 People say that G creates an [n, k, d(G)] code. Thus, our example is a [3, 1, 3] code. [The bigger the distance between codewords, the more errors we can correct] Recall our definition which enables us to compute Thus, errors move a codeword in some direction, and the goal of error correction is to move it back, which can be done by choosing the closest codeword. Or, rather, this can be done if the error doesn’t move the codeword too far. What is too far? Let us look at the so-called Hamming spheres around each codeword. In the diagram we see that we can move the blue words back to the closest codeword but we cannot do that for the red ones. The key quantity here is d(G) from which we can calculate what radius r each spheres should have to be nonoverlapping: d(G) − 1 2 Thus, when we use G to encode, we can detect up to d(G) − 1 bit flips, and we can correct up to r bit flips. Theorem: For a radius r, and for a generator matrix G such that d(G) > 2r, and for a codeword s ∈ G(Bk), and for a corrupted code word s′ = s + e, where w(e) ≤ r, we can recover s.
For our example of a [3,1,3] code, we have d(G) = 3 so
r = d(G)−1 = 3−1 = 1 22
So our example G can correct a single bit flip.
d(s,s′) = w(s+s′) = w(s+(s+e)) = w(e)

Is use of G better than sending raw data? Let us compare 1) the probability that we cannot correct an error after use of G and 2) the probability that we cannot correct an error after sending raw data.
Suppose a bit flips with probability p. Let us assume that 1) each bit flips with the same probability p and that 2) each bit flips independently of the others.
For use of our [3, 1, 3] code, what is the probability that at least two bits flip? P (at least two of three bits flip) = 3p2(1 − p) + p3
Now we can compare this with the probability that a single bit flips, which is what we worry about when we send raw data. We want the probability that at least two bits flip to be significantly smaller than the probability that a single bit flips. So we consider the following inequality and assume 0 < p < 1: 3p2−2p3 < p ⇐⇒ 3p2 < 2p3+p ⇐⇒ 3p < 2p2+1 ⇐⇒ 0 < 2p2−3p+1 ⇐⇒ p< 3−√32 −4×2×1 ∨ p> 3+√32 −4×2×1
2×2 2×2 ⇐⇒ p<3−1 ∨ p>3+1
44 ⇐⇒ p<12 The calculation shows that if p < 12 , then we should encode rather than send raw data. What if p is fairly small, like 0.01? Single bit flip on three bits 3p2 − 2p3 Single bit flip on a single bit In the table above, we see that when p gets to the level of error rates that we see in quantum computing currently, the repetition code that repeats three times does well. Intuitively, for small p, the quantity 3p2 is a lot bigger than 2p3, so if we focus on the exponents in big-O style, the comparison is between p2 and p. One can argue that repetition code is wasteful. For example, an encoding of 4 bits needs 3 × 4 = 12 bits. Can we do better? Yes we can; this is what the entire field of classical error correction is all about. We can think of the repetition code that repeats three times as a [3,1,3] code, or if we scale it to 4 bits, as a [12,4,3] code. Now let us look at a [7,4,3] code, which means that we will encode 4 bits as 7 bits, rather than 12 bits. We begin with the generator matrix, which is a 7 × 4 matrix, and we can look at an example of apply G to data: 1111 1101 1011 G= 1001 0111 0101  1   0  Notice that the codeword with the fewest number of 1s that we can produce with G, other than 0, has three 1s. So d(G) = 3. Next let us look at a parity matrix P associated with G.  1111000 P=1100110 Notice that PG = 0. Voyager 1+2 used Golay code when visiting Jupiter and Saturn, specifically a [24, 12, 8] code. This scheme encodes 12 bits of data in a 24-bit word in such a way that any 7-bit error can be detected, and any 3-bit error can be corrected. Transition to Close: So there you have it. Review: We can use a linear function G to map data to codewords and then use a different linear function P to detect errors. This works well when we have some distance between the codewords. Here G adds redundancy while P does a parity check. The idea is that a codeword s is uncorrupted iff Ps = 0. An error is represented by a bit vector that can be added to uncorrupted codewords. When Ps ̸= 0, we call it an error syndrome. Under the right circumstances, an error syndrome tells us which errror occurred, which enables us to correct the error. Strong finish: Linear codes have been used in classical computers, storage, and communication, and now the stage is set for using them in quantum computing, too. But that comes with three new challenges. The first challenge is known as the no-cloning theorem, which says that we cannot copy an unknown quantum state, so care must be taken. We cannot simply replicate an arbitrary qubit a few times before we send it. The second challenge is that qubits are made up of complex numbers rather than bits. So, we have a continuum of errors rather than just bit flips. The third challenge is that measurement destroys superposition in a quantum state. So, we cannot examine every part of a codeword and then determine how to correct an error. The field of quantum error correction overcomes all three challenges. Call to action: Master classical error correction in the form of linear codes and thereby get ready for quantum error correction.  0  G  1  =   1   0  1  0 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com