COMP2610/6261 – Information Theory – Lecture 17: Noisy Channels
COMP2610/6261 – Information Theory
Lecture 17: Noisy Channels
Bob Williamson
Research School of Computer Science
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
2 October, 2018
1 / 28
1 Communication over Noisy Channels: Big Picture
2 Noisy Channels: Formally
3 Examples of Channels
4 Probability of Error
2 / 28
The Big Picture
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9
Communication over a Noisy Channel
9.1 The big picture
Noisy
channel
Encoder Decoder
Compressor Decompressor
Source
coding
Channel
coding
Source
✲
✻
✻
✻
❄
❄
In Chapters 4–6, we discussed source coding with block codes, symbol codes
and stream codes. We implicitly assumed that the channel from the compres-
sor to the decompressor was noise-free. Real channels are noisy. We will now
spend two chapters on the subject of noisy-channel coding – the fundamen-
tal possibilities and limitations of error-free communication through a noisy
channel. The aim of channel coding is to make the noisy channel behave like
a noiseless channel. We will assume that the data to be transmitted has been
through a good compressor, so the bit stream has no obvious redundancy. The
channel code, which makes the transmission, will put back redundancy of a
special sort, designed to make the noisy received signal decodeable.
Suppose we transmit 1000 bits per second with p0 = p1 = 1/2 over a
noisy channel that flips bits with probability f = 0.1. What is the rate of
transmission of information? We might guess that the rate is 900 bits per
second by subtracting the expected number of errors per second. But this is
not correct, because the recipient does not know where the errors occurred.
Consider the case where the noise is so great that the received symbols are
independent of the transmitted symbols. This corresponds to a noise level of
f = 0.5, since half of the received symbols are correct due to chance alone.
But when f = 0.5, no information is transmitted at all.
Given what we have learnt about entropy, it seems reasonable that a mea-
sure of the information transmitted is given by the mutual information between
the source and the received signal, that is, the entropy of the source minus the
conditional entropy of the source given the received signal.
We will now review the definition of conditional entropy and mutual in-
formation. Then we will examine whether it is possible to use such a noisy
channel to communicate reliably. We will show that for any channel Q there
is a non-zero rate, the capacity C(Q), up to which information can be sent
146
3 / 28
The Big Picture
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9
Communication over a Noisy Channel
9.1 The big picture
Noisy
channel
Encoder Decoder
Compressor Decompressor
Source
coding
Channel
coding
Source
✲
✻
✻
✻
❄
❄
In Chapters 4–6, we discussed source coding with block codes, symbol codes
and stream codes. We implicitly assumed that the channel from the compres-
sor to the decompressor was noise-free. Real channels are noisy. We will now
spend two chapters on the subject of noisy-channel coding – the fundamen-
tal possibilities and limitations of error-free communication through a noisy
channel. The aim of channel coding is to make the noisy channel behave like
a noiseless channel. We will assume that the data to be transmitted has been
through a good compressor, so the bit stream has no obvious redundancy. The
channel code, which makes the transmission, will put back redundancy of a
special sort, designed to make the noisy received signal decodeable.
Suppose we transmit 1000 bits per second with p0 = p1 = 1/2 over a
noisy channel that flips bits with probability f = 0.1. What is the rate of
transmission of information? We might guess that the rate is 900 bits per
second by subtracting the expected number of errors per second. But this is
not correct, because the recipient does not know where the errors occurred.
Consider the case where the noise is so great that the received symbols are
independent of the transmitted symbols. This corresponds to a noise level of
f = 0.5, since half of the received symbols are correct due to chance alone.
But when f = 0.5, no information is transmitted at all.
Given what we have learnt about entropy, it seems reasonable that a mea-
sure of the information transmitted is given by the mutual information between
the source and the received signal, that is, the entropy of the source minus the
conditional entropy of the source given the received signal.
We will now review the definition of conditional entropy and mutual in-
formation. Then we will examine whether it is possible to use such a noisy
channel to communicate reliably. We will show that for any channel Q there
is a non-zero rate, the capacity C(Q), up to which information can be sent
146
Concept: Expected code length
Theorem: Source coding theorem
Algorithms: { Huffman, Arithmetic } codes 3 / 28
The Big Picture
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9
Communication over a Noisy Channel
9.1 The big picture
Noisy
channel
Encoder Decoder
Compressor Decompressor
Source
coding
Channel
coding
Source
✲
✻
✻
✻
❄
❄
In Chapters 4–6, we discussed source coding with block codes, symbol codes
and stream codes. We implicitly assumed that the channel from the compres-
sor to the decompressor was noise-free. Real channels are noisy. We will now
spend two chapters on the subject of noisy-channel coding – the fundamen-
tal possibilities and limitations of error-free communication through a noisy
channel. The aim of channel coding is to make the noisy channel behave like
a noiseless channel. We will assume that the data to be transmitted has been
through a good compressor, so the bit stream has no obvious redundancy. The
channel code, which makes the transmission, will put back redundancy of a
special sort, designed to make the noisy received signal decodeable.
Suppose we transmit 1000 bits per second with p0 = p1 = 1/2 over a
noisy channel that flips bits with probability f = 0.1. What is the rate of
transmission of information? We might guess that the rate is 900 bits per
second by subtracting the expected number of errors per second. But this is
not correct, because the recipient does not know where the errors occurred.
Consider the case where the noise is so great that the received symbols are
independent of the transmitted symbols. This corresponds to a noise level of
f = 0.5, since half of the received symbols are correct due to chance alone.
But when f = 0.5, no information is transmitted at all.
Given what we have learnt about entropy, it seems reasonable that a mea-
sure of the information transmitted is given by the mutual information between
the source and the received signal, that is, the entropy of the source minus the
conditional entropy of the source given the received signal.
We will now review the definition of conditional entropy and mutual in-
formation. Then we will examine whether it is possible to use such a noisy
channel to communicate reliably. We will show that for any channel Q there
is a non-zero rate, the capacity C(Q), up to which information can be sent
146
Concept: Channel capacity
Theorem: Channel coding theorem
Algorithms: Repetition codes, Hamming codes 3 / 28
Communication over Noisy Channels
A channel is some medium for transmitting messages
A noisy channel is a channel that potentially introduces errors in the
sender’s message
The Problem of Communication
“The fundamental problem of communication is that of reproducing at one
point either exactly or approximately a message selected at another point.”
(Claude Shannon, 1948)
4 / 28
Example: Telephone Network
“Hi”% “Hi”%
Source : Aditya
Encoder : Telephone handset
Channel : Analogue telephone line
Decoder : Telephone handset
Destination : Mark
5 / 28
Key Questions
How do we model noisy communication abstractly?
What are the theoretical limits of noise correction?
What are the practical approaches to noise correction?
6 / 28
1 Communication over Noisy Channels: Big Picture
2 Noisy Channels: Formally
3 Examples of Channels
4 Probability of Error
7 / 28
Encoder/Decoder Pairs
Suppose we have some set S = {1, 2, . . . ,S} of possible messages
e.g. codewords from Huffman coding on some ensemble
Sender and receiver agree on what these are
When communicating over a channel, the sender must encode messages
into some input alphabet X
The receiver then receives some (possibly corrupted) element from an
output alphabet Y
Simple case: X = Y = {0, 1}
The bit the sender transmits may not be what the receiver sees
8 / 28
Encoder/Decoder Pairs
Suppose we have some set S = {1, 2, . . . ,S} of possible messages
e.g. codewords from Huffman coding on some ensemble
Sender and receiver agree on what these are
When communicating over a channel, the sender must encode messages
into some input alphabet X
The receiver then receives some (possibly corrupted) element from an
output alphabet Y
Simple case: X = Y = {0, 1}
The bit the sender transmits may not be what the receiver sees
8 / 28
Encoder/Decoder Pairs
Suppose we have some set S = {1, 2, . . . ,S} of possible messages
e.g. codewords from Huffman coding on some ensemble
Sender and receiver agree on what these are
When communicating over a channel, the sender must encode messages
into some input alphabet X
The receiver then receives some (possibly corrupted) element from an
output alphabet Y
Simple case: X = Y = {0, 1}
The bit the sender transmits may not be what the receiver sees
8 / 28
Encoder/Decoder Pairs
Formally, the sender encodes messages via
enc : S → XN
The receiver then decodes messages via
dec : YN → S
Isn’t the compressor already “encoding” a message?
Yes, but we might want to add something for noise tolerance
Might have X 6= Y
e.g. if we allow a special “erased” symbol
N > 1 can be thought of as multiple uses of a channel
e.g. use a bitstring of length 4 to represent messages
9 / 28
Encoder/Decoder Pairs
Formally, the sender encodes messages via
enc : S → XN
The receiver then decodes messages via
dec : YN → S
Isn’t the compressor already “encoding” a message?
Yes, but we might want to add something for noise tolerance
Might have X 6= Y
e.g. if we allow a special “erased” symbol
N > 1 can be thought of as multiple uses of a channel
e.g. use a bitstring of length 4 to represent messages
9 / 28
Encoder/Decoder Pairs
Formally, the sender encodes messages via
enc : S → XN
The receiver then decodes messages via
dec : YN → S
Isn’t the compressor already “encoding” a message?
Yes, but we might want to add something for noise tolerance
Might have X 6= Y
e.g. if we allow a special “erased” symbol
N > 1 can be thought of as multiple uses of a channel
e.g. use a bitstring of length 4 to represent messages
9 / 28
Channels: Informally
A discrete channel Q will:
accept an input from X
produce an output from Y
There is a probability of observing various outputs, given an input
This represents some inherent noise
Noise could depend on the input
10 / 28
Channels: Informally
A discrete channel Q will:
accept an input from X
produce an output from Y
There is a probability of observing various outputs, given an input
This represents some inherent noise
Noise could depend on the input
10 / 28
Channels: Formally
A discrete channel Q consists of:
an input alphabet X = {a1, . . . , aI}
an output alphabet Y = {b1, . . . , bJ}
transition probabilities P(y |x).
The channel Q can be expressed as a matrix
Qj,i = P(y = bj |x = ai)
This represents the probability of observing bj given that we transmit ai
11 / 28
Channels: Formally
A discrete channel Q consists of:
an input alphabet X = {a1, . . . , aI}
an output alphabet Y = {b1, . . . , bJ}
transition probabilities P(y |x).
The channel Q can be expressed as a matrix
Qj,i = P(y = bj |x = ai)
This represents the probability of observing bj given that we transmit ai
11 / 28
Channels: Example
Example: A channel Q with inputs X = {a1, a2, a3}, outputs
Y = {b1, b2}, and transition probabilities expressed by the matrix
Q =
[
0.8 0.5 0.2
0.2 0.5 0.8
]
So P(b1|a1) = 0.8 = P(b2|a3) and P(b1|a2) = P(b2|a2) = 0.5.
We arrange the inputs along the columns, and outputs along the rows
Actual details of alphabet are abstracted away
12 / 28
Channels: Example
Example: A channel Q with inputs X = {a1, a2, a3}, outputs
Y = {b1, b2}, and transition probabilities expressed by the matrix
Q =
[
0.8 0.5 0.2
0.2 0.5 0.8
]
So P(b1|a1) = 0.8 = P(b2|a3) and P(b1|a2) = P(b2|a2) = 0.5.
We arrange the inputs along the columns, and outputs along the rows
Actual details of alphabet are abstracted away
12 / 28
1 Communication over Noisy Channels: Big Picture
2 Noisy Channels: Formally
3 Examples of Channels
4 Probability of Error
13 / 28
The Binary Noiseless Channel
One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.
0 0
1 1
X Y
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 0
0 1
]
What was transmitted over the channel if 0000 1111 was received?
0000 1111
Q−→ 0000 1111
14 / 28
The Binary Noiseless Channel
One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.
0 0
1 1
X Y
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 0
0 1
]
What was transmitted over the channel if 0000 1111 was received?
0000 1111
Q−→ 0000 1111
14 / 28
The Binary Noiseless Channel
One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.
0 0
1 1
X Y
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 0
0 1
]
What was transmitted over the channel if 0000 1111 was received?
0000 1111
Q−→ 0000 1111
14 / 28
The Noisy Non-overlapping Channel
Even if there is some uncertainty about the output given the input, it may
still be possible to perfectly infer what was transmitted.
0
a
X Y
b
1
c
d
Inputs X = {0, 1}; Outputs
Y = {a, b, c, d}; Transition probabilities
Q =
0.5 0
0.5 0
0 0.5
0 0.5
What was transmitted over the channel if abba ddcd was received?
0000 1111
Q−→ abba ddcd
15 / 28
The Noisy Non-overlapping Channel
Even if there is some uncertainty about the output given the input, it may
still be possible to perfectly infer what was transmitted.
0
a
X Y
b
1
c
d
Inputs X = {0, 1}; Outputs
Y = {a, b, c, d}; Transition probabilities
Q =
0.5 0
0.5 0
0 0.5
0 0.5
What was transmitted over the channel if abba ddcd was received?
0000 1111
Q−→ abba ddcd
15 / 28
The Noisy Non-overlapping Channel
Even if there is some uncertainty about the output given the input, it may
still be possible to perfectly infer what was transmitted.
0
a
X Y
b
1
c
d
Inputs X = {0, 1}; Outputs
Y = {a, b, c, d}; Transition probabilities
Q =
0.5 0
0.5 0
0 0.5
0 0.5
What was transmitted over the channel if abba ddcd was received?
0000 1111
Q−→ abba ddcd
15 / 28
The Binary Symmetric Channel
Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)
0 0
1 1
X Y
1-f
1-f
f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f
Q =
[
1− f f
f 1− f
]
What was most likely transmitted over the channel if 0010 1001 was
received, assuming f = 0.1 and P(x = 0) = P(x = 1) = 0.5?
0010 1001
Q−→ 0010 1001
16 / 28
The Binary Symmetric Channel
Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)
0 0
1 1
X Y
1-f
1-f
f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f
Q =
[
1− f f
f 1− f
]
What was most likely transmitted over the channel if 0010 1001 was
received, assuming f = 0.1 and P(x = 0) = P(x = 1) = 0.5?
0010 1001
Q−→ 0010 1001
16 / 28
The Binary Symmetric Channel
Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)
0 0
1 1
X Y
1-f
1-f
f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f
Q =
[
1− f f
f 1− f
]
What was most likely transmitted over the channel if 0010 1001 was
received, assuming f = 0.1 and P(x = 0) = P(x = 1) = 0.5?
0010 1001
Q−→ 0010 1001
16 / 28
Inferring the Input
Suppose we know the P(x) — the probability x is transmitted.
Given a particular output y ∈ Y received over a channel Q, how likely was
it that input x ∈ X was transmitted?
P(x |y) = P(y |x)P(x)
P(y)
=
P(y |x)P(x)∑
x ′∈X P(y |x ′)P(x ′)
17 / 28
Inferring the Input
Suppose we know the P(x) — the probability x is transmitted.
Given a particular output y ∈ Y received over a channel Q, how likely was
it that input x ∈ X was transmitted?
P(x |y) = P(y |x)P(x)
P(y)
=
P(y |x)P(x)∑
x ′∈X P(y |x ′)P(x ′)
17 / 28
Inferring the Input: Example
Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?
P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9
Similarly, P(x = 1|y = 1) = 0.9.
What if P(x = 0) = 0.01?
P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.
18 / 28
Inferring the Input: Example
Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?
P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9
Similarly, P(x = 1|y = 1) = 0.9.
What if P(x = 0) = 0.01?
P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.
18 / 28
Inferring the Input: Example
Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?
P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9
Similarly, P(x = 1|y = 1) = 0.9.
What if P(x = 0) = 0.01?
P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.
18 / 28
Inferring the Input: Example
Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?
P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9
Similarly, P(x = 1|y = 1) = 0.9.
What if P(x = 0) = 0.01?
P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.
18 / 28
The Z Channel
Symbols may be corrupted over the channel asymmetrically.
0 0
1 1
X Y
1
1-f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 f
0 1− f
]
Inferring the input: Clearly P(x = 1|y = 1) = 1 but
P(x = 0|y = 0) = P(y = 0|x = 0)P(x = 0)∑
x ′∈X P(y = 0|x ′)P(x ′)
=
P(x = 0)
P(x = 0) + f P(x = 1)
So P(x = 0|y = 0)→ 1 as f → 0, and goes to P(x = 0) as f → 1
19 / 28
The Z Channel
Symbols may be corrupted over the channel asymmetrically.
0 0
1 1
X Y
1
1-f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 f
0 1− f
]
Inferring the input: Clearly P(x = 1|y = 1) = 1 but
P(x = 0|y = 0) = P(y = 0|x = 0)P(x = 0)∑
x ′∈X P(y = 0|x ′)P(x ′)
=
P(x = 0)
P(x = 0) + f P(x = 1)
So P(x = 0|y = 0)→ 1 as f → 0, and goes to P(x = 0) as f → 1
19 / 28
The Binary Erasure Channel
We can model a channel which “erases” bits by letting one of the output
symbols be the symbol ’?’ with associated probability f . The receiver
knows which bits are erased.
0 0
1 1
X Y
1-f
1-f
f
f
?
Inputs X = {0, 1}; Outputs Y = {0, ?, 1};
Transition probabilities
Q =
1− f 0
f f
0 1− f
Example:
0000 1111
Q−→ 00?0 ?11?
20 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Q =
1
3
1
3 0 0 · · · 0
1
3
1
3
1
3
1
3 0 · · · 0 0
0 13
1
3
1
3 · · · 0 0
…
…
…
…
. . .
…
…
1
3 0 0 · · ·
1
3
1
3
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Q =
1
3
1
3 0 0 · · · 0
1
3
1
3
1
3
1
3 0 · · · 0 0
0 13
1
3
1
3 · · · 0 0
…
…
…
…
. . .
…
…
1
3 0 0 · · ·
1
3
1
3
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Q =
1
3
1
3 0 0 · · · 0
1
3
1
3
1
3
1
3 0 · · · 0 0
0 13
1
3
1
3 · · · 0 0
…
…
…
…
. . .
…
…
1
3 0 0 · · ·
1
3
1
3
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
“”#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
“”#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&’
((()
–
Z
Y
–
Z
Y
&&
&’
((()
!&&
&’
…((()
&&
&’!&
&&’
((()
!””
“#
$$$%
!&&
&’
((()
!&&
&’
((()
!&&
&’
((()
!&&
&’
((()
!&&
&’
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
…
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
…
–
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z –
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
“”#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 ” 0.1
0.85 ” 0.1 + 0.15 ” 0.9
=
0.085
0.22
= 0.39. (9.5)
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
1 Communication over Noisy Channels: Big Picture
2 Noisy Channels: Formally
3 Examples of Channels
4 Probability of Error
22 / 28
Communicating over Noisy Channels
Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.
QEncoder
x ysin sout
Decoder
Reliability is measured via probability of error — that is, the probability of
incorrectly decoding sout given sin as input:
P(sout 6= sin) =
∑
s
P(sout 6= sin|sin = s)P(sin = s)
23 / 28
Communicating over Noisy Channels
Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.
QEncoder
x ysin sout
Decoder
Reliability is measured via probability of error — that is, the probability of
incorrectly decoding sout given sin as input:
P(sout 6= sin) =
∑
s
P(sout 6= sin|sin = s)P(sin = s)
23 / 28
Probability of Error: Example
Let S = {a, b} and X = Y = {0, 1}
Assume encoder: a→ 0 ; b→ 1, decoder: 0→ a ; 1→ b.
Consider binary symmetric Q,
Q =
[
1− f f
f 1− f
]
with f = 0.1
24 / 28
Probability of Error: Example
If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),
P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f
= 0.1
25 / 28
Probability of Error: Example
If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),
P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f
= 0.1
25 / 28
Probability of Error: Example
If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),
P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f
= 0.1
25 / 28
Probability of Error: Example
If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),
P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f
= 0.1
25 / 28
Probability of Error: Example
If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),
P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f
= 0.1
25 / 28
A Simple Coding Scheme
Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s
000, 001, 010, 100︸ ︷︷ ︸
A
→ a and 111, 110, 101, 011︸ ︷︷ ︸
B
→ b
If the channel Q is binary symmetric with f = 0.1 again
P(sin 6= sout) = P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ) = 0.028
So the error has dropped from 0.1 to 0.028
but so has the rate: from 1 symbol/bit to 1/3 symbol/bit.
26 / 28
A Simple Coding Scheme
Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s
000, 001, 010, 100︸ ︷︷ ︸
A
→ a and 111, 110, 101, 011︸ ︷︷ ︸
B
→ b
If the channel Q is binary symmetric with f = 0.1 again
P(sin 6= sout) = P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ) = 0.028
So the error has dropped from 0.1 to 0.028
but so has the rate: from 1 symbol/bit to 1/3 symbol/bit.
26 / 28
A Simple Coding Scheme
Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s
000, 001, 010, 100︸ ︷︷ ︸
A
→ a and 111, 110, 101, 011︸ ︷︷ ︸
B
→ b
If the channel Q is binary symmetric with f = 0.1 again
P(sin 6= sout) = P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ) = 0.028
So the error has dropped from 0.1 to 0.028
but so has the rate: from 1 symbol/bit to 1/3 symbol/bit.
26 / 28
A Simple Coding Scheme
0
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 1
Rate
more useful codesR5
R3
R61
R1
p
b
0.1
0.01
1e-05
1e-10
1e-15
0 0.2 0.4 0.6 0.8 1
Rate
more useful codes
R5
R3
R61
R1
Can we make the error arbitrarily small without the rate going to zero?
27 / 28
A Simple Coding Scheme
0
0.02
0.04
0.06
0.08
0.1
0 0.2 0.4 0.6 0.8 1
Rate
more useful codesR5
R3
R61
R1
p
b
0.1
0.01
1e-05
1e-10
1e-15
0 0.2 0.4 0.6 0.8 1
Rate
more useful codes
R5
R3
R61
R1
Can we make the error arbitrarily small without the rate going to zero?
27 / 28
Summary and Reading
Main Points:
Modelling Noisy Channels
I Noiseless, Overlap, Symmetric, Z, Erasure
Simple Coding via Repetition
I Probability of Error vs Transmission Rate
Reading:
MacKay §9.1 – §9.5
Cover & Thomas §7.1 – §7.3
28 / 28
Communication over Noisy Channels: Big Picture
Noisy Channels: Formally
Examples of Channels
Probability of Error