CS 70 Discrete Mathematics and Probability Theory
Fall 2020 Course Notes Note 9
Error Correcting Codes
In this note, we will discuss the problem of transmitting messages across an unreliable communication chan- nel. The channel may cause some parts of the message (“packets”) to be lost, or dropped; or, more seriously, it may cause some packets to be corrupted. We will learn how to “encode” the message by introducing re- dundancy into it in order to protect against both of these types of errors. Such an encoding scheme is known as an “error correcting code.” Error correcting codes are a major object of study in mathematics, com- puter science and electrical engineering; they belong to a field known as “Information Theory,” one of the core computer sciences1, along with the theory of computation, control theory, communication theory, and estimation/learning/signal-processing theory. In addition to the beautiful theory underlying them (which we will glimpse in this note), they are of great practical importance: every time you use your cellphone, satellite TV, DSP, cable modem, disk drive, CD-ROM, DVD player etc., or send and receive data over the internet, you are using error correcting codes to ensure that information is transmitted reliably. Error-correcting codes are also used in data centers and to speed up parallel computations.
There are, very roughly speaking, (at least) two distinct flavors of error correcting codes: algebraic2 codes, which are based on polynomials over finite fields, and combinatorial codes, which are based on graph theory. In this note we will focus on algebraic codes, and in particular on so-called Reed-Solomon codes (named after two of their inventors). In doing so, we will be making essential use of the properties we learned about polynomials in the last lecture. These error-correcting codes also have a fundamentally linear-algebraic flavor to them as well, as will become clear throughout this note. Being based in finite fields allows the “information as degrees of freedom” perspective from linear-algebraic modeling to be made literal in terms of discrete symbols, bit-rates, etc.
Erasure Errors
We will consider two situations in which we wish to transmit information over an unreliable channel. The first is exemplified by the Internet, where the information (say a file) is broken up into packets, and the unreliability is manifest in the fact that some of the packets are lost during transmission, as shown below:
We refer to such errors as erasure errors. Suppose that the message consists of n packets and suppose that at most k packets are lost during transmission. We will show how to encode the initial message consisting of n packets into a redundant encoding consisting of n + k packets such that the recipient can reconstruct the
1This is also reflected in an interesting history. The IEEE was formed by the merger of two societies — the Institute of Radio Engineers and the American Institute of Electrical Engineers. The Radio Engineers were intimately involved with communication, cryptography, radar, etc. The IRE and AIEE merged to form the IEEE in 1963, but the IRE was already bigger than the AIEE by 1957. This merger was natural because electronics was becoming an important implementation substrate for both sets of engineers and the modern theory of electrical systems used the same mathematics as radio and signal processing (as well as control). This is why the word “EE” is often used to describe these computer sciences. At Berkeley, of course, this is all just a historical footnote since you simply experience EECS as a single entity, but the history remains in some of the course numbers.
2The ones we study are representative of what are called algebraic-geometry codes, because of their connections to algebraic geometry — the branch of mathematics that studies the roots of polynomial equations.
CS 70, Fall 2020, Note 9 1
message from any n received packets. Note that in this setting the packets are labeled with headers, and thus the recipient knows exactly which packets were dropped during transmission.
We can assume without loss of generality that the content of each packet is a number modulo q, where q is a prime. For example, the content of the packet might be a 32-bit string and can therefore be regarded as a number between 0 and 232 − 1; then we could choose q to be any prime larger than 232. The properties of polynomials over GF(q) (i.e., with coefficients and values reduced modulo q) are perfectly suited3 to solve this problem and are the backbone of this error-correcting scheme.
To see this, let us denote the message to be sent by m1,…,mn, where each mi is a number in GF(q), and make the following crucial observations:
1. There is a unique polynomial P(x) of degree n − 1 such that P(i) = mi for 1 ≤ i ≤ n (i.e., P(x) contains all of the information about the message, and evaluating P(i) gives the intended contents of the i-th packet).
2. The message to be sent is now m1 = P(1), . . . , mn = P(n). We can generate additional packets by evalu- ating P(x) at points n + j. (Recall that our transmitted codeword should be redundant, i.e., it should contain more packets than the original message to account for the lost packets. This is the distinction between a codeword and a message. A codeword is what is transmitted and has redundancy by construction while the message is something that the user gives us and does not necessarily contain any redundancy.) Thus the transmitted codeword is c1 = P(1), c2 = P(2), . . . , cn+k = P(n + k). Since we are working modulo q, we must make sure that n + k ≤ q, but this condition does not impose a serious constraint since q is assumed to be very large.
3. We can uniquely reconstruct P(x) from its values at any n distinct points, since it has degree n − 1. This means that P(x) can be reconstructed from any n of the transmitted packets (not just the original n packets). Once we have reconstructed the polynomial P, we can evaluate P(x) at x = 1, . . . , n to recover the original message m1,…,mn.
Example
Suppose Alice wants to send Bob a message of n = 4 packets and she wants to guard against k = 2 lost packets. Then, assuming the packets can be coded up as integers between 0 and 6, Alice can work over GF (7) (since 7 ≥ n + k = 6; of course, in real applications, we would be working over a much larger field!). SupposethemessagethatAlicewantstosendtoBobism1 =3,m2 =1,m3 =5,andm4 =0. Theunique polynomial of degree n − 1 = 3 described by these 4 points is P(x) = x3 + 4×2 + 5.
Exercise. We derived this polynomial using Lagrange interpolation mod 7. Check this derivation, and verify alsothatindeedP(i)=mi for1≤i≤4.
Since k = 2, Alice must evaluate P(x) at 2 extra points: P(5) = 6 and P(6) = 1. Now, Alice can transmit the encoded codeword which consists of n+k = 6 packets, where cj = P(j) for 1 ≤ j ≤ 6. So Alice will send c1 =P(1)=3,c2 =P(2)=1,c3 =P(3)=5,c4 =P(4)=0,c5 =P(5)=6,andc6 =P(6)=1.
Now suppose packets 2 and 6 are dropped, in which case we have the following situation:
From the values that Bob received (3, 5, 0, and 6), he uses Lagrange interpolation and computes the follow-
3In real-world implementations, we do not do this. Instead, we work directly in finite fields that have size 232 because that is a prime power and working with fields that are a power-of-two in size is convenient for computer operations. However, the efficient construction of such fields is beyond the scope of 70. A taste can be had in Math 114 and in EE229B.
CS 70, Fall 2020, Note 9 2
ing basis polynomials (where everything should be interpreted mod 7): ∆1(x) = (x−3)(x−4)(x−5) ≡ 2(x−3)(x−4)(x−5)
(mod 7) (mod 7) (mod 7)
(mod 7).
−24
∆3(x) = (x−1)(x−4)(x−5) ≡ 2(x−1)(x−4)(x−5)
4
∆4(x) = (x−1)(x−3)(x−5) ≡ 2(x−1)(x−3)(x−5)
−3
∆5(x) = (x−1)(x−3)(x−4) ≡ (x−1)(x−3)(x−4)
8
(Note that we have used the fact here that the inverses of −24, 4, −3 and 8 (mod 7) are 2, 2, 2 and 1 respectively.) He then reconstructs the polynomial P(x) = 3 · ∆1(x) + 5 · ∆3(x) + 0 · ∆4(x) + 6 · ∆5(x) = x3 + 4×2 + 5 (mod 7). Bob then evaluates m2 = P(2) = 1, which is the packet that was lost from the original codeword. More generally, no matter which two packets were dropped, following exactly the same method Bob can always reconstruct P(x) and thus the original underlying message.
Exercise. Check Bob’s calculation above, and verify that he really does reconstruct the correct polynomial, as claimed. Remember that all arithmetic must be done mod 7.
Let us consider what would happen if Alice sent one fewer packet. If Alice only sent c j for 1 ≤ j ≤ n + k − 1, then with k erasures, Bob would only receive cj for n−1 distinct values j. Thus, Bob would not be able to reconstruct P(x) (since there are q polynomials of degree at most n − 1 that agree with the n − 1 packets which Bob received)! This error-correcting scheme is therefore optimal: it can recover the n characters of the transmitted message from any n received characters, but recovery from any smaller number of characters is impossible.
Polynomial Interpolation Revisited
Let us take a brief digression to discuss another method of polynomial interpolation (different from Lagrange interpolation discussed in the previous note) which will be useful in handling general errors. We need to make the underlying linear-algebra here more explicit so that we can better lean on it going forward.
Again, the goal of the algorithm will be to take as input d +1 pairs (x1,y1),··· ,(xd+1,yd+1), and output the polynomialp(x)=adxd+···+a1x+a0 suchthatp(xi)=yi fori=1tod+1.
The first step of the algorithm is to write a system of d + 1 linear equations in d + 1 variables: the variables are
the coefficients of the polynomial, a0 , . . . , ad . Each equation is obtained by fixing x to be one of d + 1 values:
x1,··· ,xd+1. Note that in p(x), x is a variable and a0,…,ad are fixed constants. In the equations below, these
roles are swapped: xi is a fixed constant and a0,…,ad are variables. For example, the i-th equation is the
result of fixing x to be xi, and saying that the value of the polynomial is yi: adxd +ad−1xd−1 +···+a0 = yi. ii
Now solving these equations gives the coefficients of the polynomial p(x). For example, suppose we are given the three pairs (−1, 2), (0, 1), and (2, 5); our goal is to construct the degree-2 polynomial p(x) which goes through these points. The first equation says a2(−1)2 + a1(−1) + a0 = 2. Simplifying, we get a2 − a1 + a0 = 2. Similarly, the second equation says a2 (0)2 + a1 (0) + a0 = 1, or a0 = 1. And the third equation
CS 70, Fall 2020, Note 9 3
says a2 (2)2 + a1 (2) + a0 = 5 So we get the following system of equations:
a2 − a1 + a0 = 2 a0 = 1 4a2 + 2a1 + a0 = 5
Substituting for a0 and multiplying the first equation by 2 we get: 2a2 − 2a1 = 2
4a2 + 2a1 = 4
Then, adding the two equations we find that 6a2 = 6, so a2 = 1, and plugging back in we find that a1 = 0. Thus, we have determined the polynomial p(x) = x2 + 1. To justify this method more carefully, we must show that such systems of equations always have a solution and that it is unique. Fortunately, the work that we did in the previous note via Lagrange Interpolation has already established this. (Can you see why?) This is one of the advantages of being able to look at the same problem from different angles — this theme will become more prominent in later parts of the course.
General Errors
Let us now return to our main topic of error correction, and consider a much more challenging scenario. Suppose that Alice wishes to communicate with Bob over a noisy channel (say, via a modem). Her mes- sage is m1,…,mn, where we may think of the mi’s as characters (either bytes or characters in the English alphabet). The problem now is that some of the characters are corrupted during transmission due to channel noise. Thus Bob receives exactly as many characters as Alice transmits, but k of them are corrupted, and Bob has no idea which k these are! If you’d like, you can think of these as being corrupted by a malicious adversary that is only limited by being able to corrupt no more than k characters. Recovering from such gen- eral errors is much more challenging than recovering from erasure errors, though once again polynomials hold the key. As we shall see, Alice can still guard against k general errors, at the expense of transmitting only 2k additional characters (twice as many as in the erasure case we saw above).
We will again think of each character as a number modulo q for some prime q. (For the English alphabet, q is some prime larger than 26, say q = 29.) As before, we can describe the message by a polynomial P(x) of degree n − 1 over GF(q), such that P(1) = m1, . . . , P(n) = mn. As before, to cope with the errors Alice will transmit additional characters obtained by evaluating P(x) at additional points. As mentioned above, in order to guard against k general errors, Alice must transmit 2k additional characters: thus the encoded codewordisc1,…,cn+2k wherecj =P(j)for1≤ j≤n+2k,andn+kofthesecharactersthatBobreceives are uncorrupted. As before, we must put the mild constraint on q that it be large enough so that q > n + 2k.
For example, if Alice wishes to send n = 4 characters to Bob via a modem in which k = 1 of the characters is corrupted, she must redundantly send an encoded message consisting of 6 characters. Suppose she wants to transmit the same message (3150) as in our erasure example above, and that c1 is corrupted from 3 to 2. This scenario can be visualized in the following figure:
From Bob’s viewpoint, the problem of reconstructing Alice’s message is the same as reconstructing the polynomial P(x) from the n+2k received characters r1,r2,…,rn+2k. In other words, Bob is given n+2k values, r1,r2,…,rn+2k modulo q, with the promise that there is a polynomial P(x) of degree n−1 over GF (q) such that P(i) = ri for n + k distinct values of i between 1 and n + 2k. Bob must reconstruct P(x) from thisdata(intheaboveexample,n+k=5,andr2 =P(2)=1,r3 =P(3)=5,r4 =P(4)=0,r5 =P(5)=6, and r6 = P(6) = 1). Note, however, that Bob does not know which of the n + k values are correct!
CS 70, Fall 2020, Note 9 4
Does Bob even have sufficient information to reconstruct P(x)? Our first observation shows that this is at least plausible: for any given subset of n + k values of i between 1 and n + 2k, there is a unique polynomial P(x) such that P(i) = ri at these values of i. To see this, suppose that P′(x) is any other polynomial of degree n − 1 that goes through these n + k points. Then among these n + k points there are at most k errors, and therefore on at least n of the points we must have P′(i) = P(i). But, as we saw in the previous note (Property 2), a polynomial of degree n−1 is uniquely defined by its values at n points, and therefore P′(x) and P(x) must be the same polynomial.
To summarize, then, Bob’s task is to find a polynomial P(x) of degree n − 1 such that P(i) = ri for at least n + k values of i. If he can do this, he can be sure that the polynomial he has found is in fact the original polynomial P(x).
But how can Bob efficiently find such a polynomial? The issue at hand is the locations of the k errors. Let e1,…,ek be the k locations at which errors occurred (so that P(i) = ri for all i ∈/ {e1,…,ek}). Bob doesn’t know where these errors are.
Now Bob could try to guess where the k errors lie, but this would take too long (it would take exponential time, in fact). Instead, Bob will employ a clever trick based on the following definition. Consider the so-called error-locator polynomial
E(x) = (x−e1)(x−e2)···(x−ek).
Note that E(x) is a polynomial of degree k (since x appears k times). Note also that Bob does not know this polynomial explicitly, because he does not know the positions ei of the errors. However, he will use the polynomial symbolically, and will eventually compute the values ei that appear in it!
Let us make a simple but crucial observation about this polynomial:
P(i)E(i) = riE(i) for 1 ≤ i ≤ n+2k. (1)
To see this, note that it holds at points i at which no error occurred since at those points P(i) = ri; and it is trivially true at points i at which an error occurred since then E(i) = 0.
This observation forms the basis of a very clever algorithm invented by Berlekamp and Welch. Looking more closely at the equalities in (??), we will show that they can be made to correspond to n + 2k linear equations in n + 2k unknowns, from which the locations of the errors and coefficients of P(x) can be easily deduced (in analogous fashion to the interpolation scheme we saw in the previous section).
Define the polynomial Q(x) := P(x)E (x), which has degree n + k − 1 and is therefore described by n + k coefficients a0,a1,…,an+k−1. The error-locator polynomial E(x) = (x−e1)···(x−ek) has degree k and is described by k+1 coefficients b0,b1,…,bk but the leading coefficient (the coefficient bk of xk) is always 1. So we can write:
Q(x) = an+k−1xn+k−1 +an+k−2xn+k−2 +···+a1x+a0; E(x) = xk +bk−1xk−1 +···+b1x+b0.
(Remember that we don’t yet know any of the coefficients ai or bi.)
Now notice that, substituting Q(x) = P(x)E (x) in (??), we get the n + 2k equations
Q(i) = riE(i) for 1 ≤ i ≤ n+2k. Writing out the ith equation using the coefficients of Q(x) and E(x), we get
an+k−1in+k−1 +an+k−2in+k−2 +···+a1i+a0 = ri(ik +bk−1ik−1 +···+b1i+b0) CS 70, Fall 2020, Note 9
(mod q).
5
This is a set of n+2k linear equations, one for each value of i, in the n+2k unknowns a0,a1,…,an+k−1,
b0,b1,…,bk−1. We can solve the systems of linear equations and get E(x) and Q(x). We can then compute
the ratio Q(x) to obtain P(x). This is best illustrated by an example. E (x)
Example. Suppose we are working over GF(7) and Alice wants to send Bob the n = 3 characters “3,” “0,” and “6” over a modem. Turning to the analogy of the English alphabet, this is equivalent to using only the first 7 letters of the alphabet, where a = 0, . . . , g = 6. So the message which Alice wishes for Bob to receive is “dag”. Then Alice interpolates (e.g., using Lagrange: exercise!) to find the polynomial
P(x) = x2 +x+1,
which is the unique polynomial of degree 2 such that P(1) = 3, P(2) = 0, and P(3) = 6.
Suppose Alice knows that up to k = 1 character could be corrupted; then she needs to transmit the n + 2k = 5 characters P(1) = 3, P(2) = 0, P(3) = 6, P(4) = 0, and P(5) = 3 to Bob. Suppose P(1) is actually corrupted, and Bob receives the character 2 instead of 3 (i.e., Alice sends the encoded codeword “dagad” but Bob instead receives “cagad”). Summarizing, we have the following situation:
Let E(x) = (x − e1) = x + b0 be the error-locator polynomial—remember, Bob doesn’t know what e1 = −b0 is yet since he doesn’t know where the error occurred—and let Q(x) = a3x3 + a2x2 + a1x + a0 (where again the coefficients ai are unknown). Now Bob just substitutes i = 1, i = 2, . . . , i = 5 into the equations Q(i) = riE(i) from (??) above and simplifies to get five linear equations in five unknowns (recall that we are working modulo 7 and that ri is the value Bob received for the i-th character):
a3 + a2 + a1 + a0 + 5b0 = 2 a3 + 4a2 + 2a1 + a0 = 0 6a3 +2a2 +3a1 +a0 +b0 = 4 a3 + 2a2 + 4a1 + a0 = 0 6a3 +4a2 +5a1 +a0 +4b0 = 1
Bobthensolvesthislinearsystemandfindsthata3 =1,a2 =0,a1 =0,a0 =6andb0 =6(allmod7). (Asa
check, this implies that E (x) = x + 6 = x − 1, so the location of the error is position e1 = 1, which is correct
since the first character was corrupted from a “d” to a “c”.) This gives him the polynomials Q(x) = x3 + 6
andE(x)=x−1. HecanthenfindP(x)bycomputingthequotientP(x)= Q(x) = x3+6 =x2+x+1 (mod 7). E (x) x−1
Bob notices that the first character was corrupted (since e1 = 1), so now that he has P(x), he just computes P(1) = 3 = “d” and obtains the original, uncorrupted message “dag”.
Exercise. Verify the derivation of the above system of equations from the equalities Q(i) = riE(i) for i = 1,2,3,4,5. Also, solve the equations (this is a little tedious, but not hard!) and check that your solution agrees with the one above. Remember that all the arithmetic should be done mod 7.
Exercise. Redo the example above for the case where the second character is corrupted from a “0” to a “5”, and all other characters are uncorrupted.
Finer Points
CS 70, Fall 2020, Note 9 6
Two points need further discussion. How do we know that the n + 2k equations are consistent? What if they have no solution? This is simple. The equations must be consistent since Q(x) = P(x)E(x) together with the actual error locator polynomial E(x) gives a solution!
A more interesting question is this: how do we know that the n + 2k equations are independent, i.e., how do we know that there aren’t other spurious solutions in addition to the real solution that we are looking for? Put more mathematically, suppose that the solution we construct is Q′(x),E′(x); how do we know that this solution satisfies the property that E′(x) divides Q′(x) and that Q′(x) = Q(x) = P(x)?
To see that this is true, we note first that, based on our method for calculating Q′(x),E′(x), we know that Q′(i) = riE′(i) for 1 ≤ i ≤ n + 2k; and of course we also have, by definition, Q(i) = riE(i) for the same values of i. Multiplying the first of these equations by E(i) and the second by E′(i), we get
Q′(i)E(i) = Q(i)E′(i) for 1 ≤ i ≤ n + 2k, (2)
since both sides are equal to riE(i)E′(i). Equation (??) tells us that the two polynomials Q(x)E′(x) and
Q′ (x)E (x) are equal at n + 2k points. But these two polynomials both have degree n + 2k − 1, so they are
completely determined by their values at n + 2k points. Therefore, since they agree at n + 2k points, they
must be the same polynomial, i.e., Q(x)E′(x) = Q′(x)E(x) for all4 x. Now we may divide through by the
polynomial E(x)E′(x) (which by construction is not the zero polynomial) to obtain Q′(x) = Q(x) = P(x), E′(x) E(x)
which is what we wanted. Hence we can be sure that any solution we find is correct.
Exercise. (Trickier). Is the solution Q′(x),E′(x) always unique? The above analysis tells us that the ratio
must always satisfy Q′(x) = Q(x) = P(x), but does not guarantee that Q′(x) = Q(x) and E′(x) = E(x). Hint: E′(x) E(x)
What happens in our example above when in fact none of the characters is corrupted (although Alice still
assumes that k = 1 character may be corrupted)? Try writing out and solving the equations in this case. You
should get a whole family of solutions, one for each possible value b0 ∈ GF (7). (Since in this scenario there
is no error, b0 is undetermined.) The other values can then be written in terms of b0: a3 = 1; a2 = 1+b0; a1 =
1+b0; a0 = 1. But, no matter what the value of b0, the quotient Q(x) = x3+(1+b0)x2+(1+b0)x+1 = x2 +x+1, E (x) x+b0
so we again recover P(x) (as we know we must).
(Optional) Distance properties
So far, we have talked about the Reed-Solomon codes in a way that constantly references their polynomial- evaluation based nature. However, it is useful to step back and ask ourselves what generic features of the codewords enabled us to recover from erasures and general errors.
It is useful here to treat a codeword as a string/vector of some fixed length, say L characters long. To protect a message of length n against k erasures, we saw that L ≥ n + k was required. To protect a message of length n against k general errors introduced by a malicious adversary, we saw that L ≥ n + 2k was required. It is a natural question to wonder whether this is merely a feature of the Reed-Solomon codes or whether it is more fundamental.
To answer this, we define a kind of “distance” on strings of length L. We say that the Hamming distance between strings ⃗s = (s1,s2,…,sL) and ⃗r = (r1,r2,…,rL) is just the count of the number of positions in
4Note that this is a much stronger statement than equation (??). Equation (??) says that the values of the polynomials agree at certain points i; this statement says that the two polynomials are equal everywhere, i.e., they are the same polynomial.
E′(x) E(x)
CS 70, Fall 2020, Note 9 7
which the two strings differ. In mathematical notation:
L
d(⃗s,⃗r) = ∑1(ri ̸= si) (3)
i=1
where the notation 1(ri ̸= si) denotes a function that returns a 1 if the condition inside is true and 0 if it is
false.
The error and erasure correcting properties of a code are determined by the distance properties of the code- words ⃗c(m), Intuitively, if the codewords are too close together, then the code is more sensitive to errors and erasures. Having codewords far apart from each other allows a code, in principle, to tolerate more erasures and errors.
To make this precise, the Minimum Distance of a code is defined as the distance between the two closest codewords. Let m and m be two distinct messages m ̸= m. Then the minimum distance of the code is minm̸=m d(⃗c(m),⃗c(m)).
If the messages themselves are the set of strings of length n, then the minimum distance of the set of messages is just 1 because two distinct strings must differ somewhere. So clearly, when the minimum distance is 1, there is no protection against errors or erasures.
When the minimum distance is larger than 1, then there is some protection against erasures and errors. Suppose that the minimum distance was 2 and one position was erased in a codeword. In principle, cycling through all the possible characters for that position would certainly yield at least 1 codeword because the true codeword could be obtained that way. But notice that no other codeword could be obtained that way since if one were to be obtained, it would have a Hamming distance of just 1 — and we stipulated that the minimum distance was 2.
It is a simple exercise to generalize the above argument to show that when the minimum distance is k + 1 or better, then the code can in principle recover from k erasure errors. If the minimum distance is k or less, then there is clearly a codeword pair for which erasing k positions would make the pair ambiguous. The resulting string could have come from either of these codewords. So we can’t hope for anything better.
For general errors, the situation is a little trickier. To get an intuition for what the corresponding story should be, imagine that you are an attacker and want to confuse the decoder. You see the encoded codeword. What are you going to do? It is intuitively clear that you will want to make the received string look like it came from another codeword. What codeword would you choose to impersonate? Intuitively, it makes sense to look for the closest codeword in the neighborhood. Suppose it was at a Hamming distance of d. That means that if you were to change d positions, then you could make the received string look exactly like this other codeword. Clearly, the decoder is pretty much guaranteed to make an error at this point.
But what if you only changed d − 1 positions to partially impersonate the other codeword? At this point, the
decoder is facing a choice. It could decode to this other codeword and chalk the single-position discrepancy
up to general errors, or it could decode to the true codeword and think that d − 1 positions have been changed.
What choice will it make? In general, the decoder will be perfectly confused between these two choices if
exactly d errors have been made in a malicious way designed to partially impersonate this other codeword. 2
However, if the number of general errors are strictly less than half the minimum distance of the code, in
principle we should be able to decode the unique codeword that is less than d from the received string. 2
The distance properties of Reed-Solomon Codes
It turns out that Reed-Solomon codes have the best-possible distance properties. (Of course, that is only a part of their attraction. Their other attraction is that their algebraic structure gives us a very nice efficient
CS 70, Fall 2020, Note 9 8
way of decoding these codes by solving systems of linear equations instead of brute-force searching through all nearby codewords.)
Theorem: The Reed Solomon code that takes n message characters to a codeword of size n + 2k has mini- mum distance 2k + 1.
Proof: We prove this using the two claims:
Claim 1) The minimum distance is ≤ 2k + 1 Claim 2) The minimum distance is ≥ 2k + 1
If we show that both (1) and (2) are true, then the minimum distance must be 2k + 1.
Proof of Claim (1):
We prove Claim (1) by constructing an example. If we can show there exist two codewords ⃗ca and ⃗cb such that d(⃗ca,⃗cb) ≤ 2k + 1, then the minimum over all distances between codewords must be ≤ 2k + 1.
Consider ⃗ma = m1m2 ···mn. Also consider ⃗mb = m1m2 ···m ̄n, where the two messages are identical in the first n − 1 positions but differ in the last position. (All strings of length n are valid messages.) So mn ̸= m ̄ n . The Hamming distance between these two messages is 1. We use these messages to generate polynomials Pa(x) and Pb(x) and these are evaluated at i such that 1 ≤ i ≤ n + 2k to generate the codewords ⃗ca and ⃗cb of length n + 2k. We use value/interpolation encoding, so Pa(i) = mi = Pb(i) for 1 ≤ i ≤ n − 1. So the first n−1 positions of⃗ca and⃗cb are identical. So they can differ in atmost n+2k−(n−1) = 2k+1 places. But this means the Hamming distance between them is ≤ 2k + 1. So we have constructed two codewords that have distance at most 2k + 1. Then the minimum distance between all pairs of codewords must be less than or equal to this. Hence, the minimum distance ≤ 2k + 1.
Proof of Claim (2):
Assume it were possible that the minimum distance between two distinct codewords could be ≤ 2k. We use this to reach a contradiction. Let these two distinct codewords be ⃗ca and ⃗cb, corresponding to distinct message polynomials Pa(x) and Pb(x). (Note: these have nothing to do with the⃗ca and⃗cb used in the Proof for Claim (1), we are just using the same variable names.) By the construction of Reed-Solomon codes, Pa(x) and Pb(x) have degree at most n−1 since the message is of size n.
Then, d(⃗ca,⃗cb) ≤ 2k. So⃗ca and⃗cb must be identical in ≥ n+2k−2k = n positions. But⃗ca and⃗cb are just the evaluations of the message polynomials Pa(x) and Pb(x) at 1 ≤ i ≤ n + 2k. So Pa(x) and Pb(x) are identical on at least n points. But this means they must be the same polynomial, since they are of degree at most n − 1. Which is a contradiction, since we assumed that⃗ca ̸=⃗cb were distinct codewords. So our assumptions must be false, and the minimum distance between two distinct codewords is ≥ 2k + 1.
CS 70, Fall 2020, Note 9 9