COMP2610/6261 – Information Theory – Lecture 19: Block Codes and the Coding Theorem
COMP2610/6261 – Information Theory
Lecture 19: Block Codes and the Coding Theorem
Robert C. Williamson
Research School of Computer Science
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
9 October, 2018
1 / 28
Channel Capacity: Recap
The largest possible reduction in uncertainty achievable across a channel
is its capacity
Channel Capacity
The capacity C of a channel Q is the largest mutual information between
its input and output for any choice of input ensemble. That is,
C = max
pX
I(X ;Y )
2 / 28
1 Block Codes
2 The Noisy-Channel Coding Theorem
3 Extended Channels
3 / 28
1 Block Codes
2 The Noisy-Channel Coding Theorem
3 Extended Channels
4 / 28
Communicating over Noisy Channels
Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.
QEncoder
x ysin sout
Decoder
5 / 28
Block Codes
We now consider codes that make repeated use of a noisy channel to
communicate a predefined set of messages S = {1, 2, . . . ,S}
Recall a general encoder is of the form
enc : S → XN
Equivalently, each s ∈ S = {1, 2, . . . ,S} is paired with a unique block of
symbols x ∈ XN
Thus, we can imagine there being S unique codewords {x(1), . . . , x(S)},
where each codeword has block length N
6 / 28
Block Codes
We now consider codes that make repeated use of a noisy channel to
communicate a predefined set of messages S = {1, 2, . . . ,S}
Recall a general encoder is of the form
enc : S → XN
Equivalently, each s ∈ S = {1, 2, . . . ,S} is paired with a unique block of
symbols x ∈ XN
Thus, we can imagine there being S unique codewords {x(1), . . . , x(S)},
where each codeword has block length N
6 / 28
Block Codes
We now consider codes that make repeated use of a noisy channel to
communicate a predefined set of messages S = {1, 2, . . . ,S}
Recall a general encoder is of the form
enc : S → XN
Equivalently, each s ∈ S = {1, 2, . . . ,S} is paired with a unique block of
symbols x ∈ XN
Thus, we can imagine there being S unique codewords {x(1), . . . , x(S)},
where each codeword has block length N
6 / 28
Block Codes: Example
Suppose S = {1, 2, 3, 4}
Message ID s Message encoding
1 00
2 01
3 10
4 11
Block size N = 2
Codewords x(1) = 00, x(2) = 01, and so on
7 / 28
Block Codes: Formally
We formalise the preceding with the following notion:
(N,K ) Block Code
Given a channel Q with inputs X and outputs Y , an integer N > 0, and
K > 0, an (N,K ) Block Code for Q is a list of S = 2K codewords
C = {x(1), x(2), . . . , x(2K )}
where each x(s) ∈ XN consists of N symbols from X .
The code is parameterised by the length of the block, and the number of
messages that are encoded
We parametrise by K = log2 S for mathematical convenience
Doesn’t have to be an integer
8 / 28
Block Codes and Rates
An (N,K ) block code makes N uses of a channel to transmit one of S
possible outcomes
We can measure the amount of “information” contained in each use as:
Rate of an (N,K ) Block Code
The rate of an (N,K ) block code is log2 SN =
K
N bits per channel use.
9 / 28
Block Codes: Examples
Examples (for Binary Symmetric Channel Q)
A (1, 1) block code: C = {0, 1}— Rate: 1
A (3, 2) block code: C = {000, 001, 100, 111}— Rate: 23
A (3, log2 3) block code: C = {001, 010, 100}— Rate:
log2 3
3 ≈ 0.53
10 / 28
Decoding Block Codes
An (N,K ) block code sends each message s ∈ S = {1, 2, . . . , 2K} over a
channel Q as xs ∈ XN
The receiver sees the block y ∈ YN , and attempts to infer s via some
dec : YN → S
Even if X = Y , the decoder must allow for any YN , not just the expected
codewords {x(1), . . . , x(2K )}
11 / 28
Decoding Block Codes
An (N,K ) block code sends each message s ∈ S = {1, 2, . . . , 2K} over a
channel Q as xs ∈ XN
The receiver sees the block y ∈ YN , and attempts to infer s via some
dec : YN → S
Even if X = Y , the decoder must allow for any YN , not just the expected
codewords {x(1), . . . , x(2K )}
11 / 28
Decoding Block Codes: Formally
Block Decoder
A decoder for a (N,K ) block code is a mapping that associates each
y ∈ YN with an ŝ ∈ {1, 2, . . . , 2K}.
Ideally, we would like the decoded ŝ to be maximally likely to be equal to s
Optimal Decoder
An optimal decoder for a code S, channel Q, and prior P(s) maps y to ŝ
such that P(ŝ|y) is maximal.
That is, decopt(y) = argmaxs P(s|y) = argmaxs P(y|s) · P(s)
12 / 28
Decoding Block Codes: Formally
Block Decoder
A decoder for a (N,K ) block code is a mapping that associates each
y ∈ YN with an ŝ ∈ {1, 2, . . . , 2K}.
Ideally, we would like the decoded ŝ to be maximally likely to be equal to s
Optimal Decoder
An optimal decoder for a code S, channel Q, and prior P(s) maps y to ŝ
such that P(ŝ|y) is maximal.
That is, decopt(y) = argmaxs P(s|y) = argmaxs P(y|s) · P(s)
12 / 28
Decoding Block Codes: Examples
Example The (2, 1) block code S = {000, 111} and majority vote decoder
d : {0, 1}3 → {1, 2} defined by
d(000) = d(001) = d(010) = d(100) = 1
d(111) = d(110) = d(101) = d(011) = 2
13 / 28
1 Block Codes
2 The Noisy-Channel Coding Theorem
3 Extended Channels
14 / 28
Rates and Reliability
Ideally, we would like to have high rate for our channel code
Low rate implies that we are being “wasteful” with our channel use
But intuitively, at high rates, we run the risk of losing reliability
If N is small, we may be more easily “confused” about an input
How to measure reliability?
15 / 28
Reliability
Want an encoder/decoder pair to reliably send a messages over channel
Q.
QEncoder
x ysin sout
Decoder
Probability of (Block) Error
Given a channel Q the probability of (block) error for a code is
pB = P(sout 6= sin) =
∑
sin
P(sout 6= sin|sin)P(sin)
and its maximum probability of (block) error is
pBM = max
sin
P(sout 6= sin|sin)
As P(sout 6= sin|sin) ≤ pBM for all sin we get pB ≤
∑
sin
pBMP(sin) = pBM
and so if pBM → 0 then pB → 0.
16 / 28
Reliability
Want an encoder/decoder pair to reliably send a messages over channel
Q.
QEncoder
x ysin sout
Decoder
Probability of (Block) Error
Given a channel Q the probability of (block) error for a code is
pB = P(sout 6= sin) =
∑
sin
P(sout 6= sin|sin)P(sin)
and its maximum probability of (block) error is
pBM = max
sin
P(sout 6= sin|sin)
As P(sout 6= sin|sin) ≤ pBM for all sin we get pB ≤
∑
sin
pBMP(sin) = pBM
and so if pBM → 0 then pB → 0.
16 / 28
Reliability
Want an encoder/decoder pair to reliably send a messages over channel
Q.
QEncoder
x ysin sout
Decoder
Probability of (Block) Error
Given a channel Q the probability of (block) error for a code is
pB = P(sout 6= sin) =
∑
sin
P(sout 6= sin|sin)P(sin)
and its maximum probability of (block) error is
pBM = max
sin
P(sout 6= sin|sin)
As P(sout 6= sin|sin) ≤ pBM for all sin we get pB ≤
∑
sin
pBMP(sin) = pBM
and so if pBM → 0 then pB → 0. 16 / 28
Reliability: Example
Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s
000, 001, 010, 100︸ ︷︷ ︸
A
→ a and 111, 110, 101, 011︸ ︷︷ ︸
B
→ b
If the channel Q is binary symmetric,
pB = P(sin 6= sout)
= P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ).
In fact,
pBM = max(P(y ∈ B|000),P(y ∈ A|111)) = f 3 + 3f 2(1− f ).
17 / 28
Reliability: Example
Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s
000, 001, 010, 100︸ ︷︷ ︸
A
→ a and 111, 110, 101, 011︸ ︷︷ ︸
B
→ b
If the channel Q is binary symmetric,
pB = P(sin 6= sout)
= P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ).
In fact,
pBM = max(P(y ∈ B|000),P(y ∈ A|111)) = f 3 + 3f 2(1− f ).
17 / 28
Achievable Rates
Ideally, we would like to consider rates of transmission for which we can
guarantee small maximum probability of block error
Even more ideally, we would like rates for which we can guarantee
arbitrarily small maximum probability of block error
We will call such rates achievable
Achievable Rate
A rate R over a channel Q is said to be achievable if, for any � > 0 there
is a (N,K ) block code and decoder such that its rate K/N ≥ R and its
maximum probability of block error satisfies
pBM = max
sin
P(sout 6= sin|sin) < �
18 / 28
Achievable Rates
Ideally, we would like to consider rates of transmission for which we can
guarantee small maximum probability of block error
Even more ideally, we would like rates for which we can guarantee
arbitrarily small maximum probability of block error
We will call such rates achievable
Achievable Rate
A rate R over a channel Q is said to be achievable if, for any � > 0 there
is a (N,K ) block code and decoder such that its rate K/N ≥ R and its
maximum probability of block error satisfies
pBM = max
sin
P(sout 6= sin|sin) < � 18 / 28 Achievable Rates Achievable rates sound nice in theory, but surely they cannot exist? Surely we will have to drive R → 0 to get small error probability? Remarkably, we have: Noisy-Channel Coding Theorem (Brief) If Q is a channel with capacity C then the rate R is achievable if and only if R ≤ C, that is, the rate is no greater than the channel capacity. 19 / 28 Achievable Rates Achievable rates sound nice in theory, but surely they cannot exist? Surely we will have to drive R → 0 to get small error probability? Remarkably, we have: Noisy-Channel Coding Theorem (Brief) If Q is a channel with capacity C then the rate R is achievable if and only if R ≤ C, that is, the rate is no greater than the channel capacity. 19 / 28 The Noisy-Channel Coding Theorem Example Example: In last lecture: BSC Q with f = 0.15 has capacity C = 0.39 bits. Suppose we want error less than � = 0.05 and rate R > 0.25
The NCCT tells us there should be, for N large enough, an (N,K )
code with K/N ≥ 0.25
Indeed, we showed the code S = {000, 111} with majority vote decoder
has probability of error 0.028 < 0.05 for Q and rate 1/3 > 0.25.
For N = 3 there is a (3, 1) code meeting the requirements.
However, there is no code with same � and rate 1/2 > 0.39 = C.
20 / 28
The Noisy-Channel Coding Theorem
Example
Example:
In last lecture: BSC Q with f = 0.15 has capacity C = 0.39 bits.
Suppose we want error less than � = 0.05 and rate R > 0.25
The NCCT tells us there should be, for N large enough, an (N,K )
code with K/N ≥ 0.25
Indeed, we showed the code S = {000, 111} with majority vote decoder
has probability of error 0.028 < 0.05 for Q and rate 1/3 > 0.25.
For N = 3 there is a (3, 1) code meeting the requirements.
However, there is no code with same � and rate 1/2 > 0.39 = C.
20 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Q =
1
3
1
3 0 0 · · · 0
1
3
1
3
1
3
1
3 0 · · · 0 0
0 13
1
3
1
3 · · · 0 0
…
…
…
…
. . .
…
…
1
3 0 0 · · ·
1
3
1
3
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Q =
1
3
1
3 0 0 · · · 0
1
3
1
3
1
3
1
3 0 · · · 0 0
0 13
1
3
1
3 · · · 0 0
…
…
…
…
. . .
…
…
1
3 0 0 · · ·
1
3
1
3
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Q =
1
3
1
3 0 0 · · · 0
1
3
1
3
1
3
1
3 0 · · · 0 0
0 13
1
3
1
3 · · · 0 0
…
…
…
…
. . .
…
…
1
3 0 0 · · ·
1
3
1
3
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
The Noisy Typewriter Channel
This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.
A A
C
B B
D
C
D
E
Z
Y Y
–
Z
–
..
.
..
.
..
.
Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
“”#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
“”#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&’
((()
–
Z
Y
–
Z
Y
&&
&’
((()
!&&
&’
…((()
&&
&’!&
&&’
((()
!””
“#
$$$%
!&&
&’
((()
!&&
&’
((()
!&&
&’
((()
!&&
&’
((()
!&&
&’
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
…
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
…
–
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z –
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
“”#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 ” 0.1
0.85 ” 0.1 + 0.15 ” 0.9
=
0.085
0.22
= 0.39. (9.5)
The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.
21 / 28
Noisy Channel Coding Theorem
NCCT
Any rate R < C is achievable for Q (i.e., for any tolerance � > 0, an (N,K )
code with rate K/N ≥ R exists with max. block error pBM < �)
Consider a simple example:
For noisy typewriter Q:
The capacity is C = log2 9
For any � > 0 and R < C
we can choose N = 1 . . .
. . . and code messages
using C = {B, E, . . . , Z}
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9.7: Intuitive preview of proof 153
!Z
-
Z
Y
""
"#
$$$%
...
!""
"#
$$$%
!""
"#
$$$%
!""
"#
$$$%
I
H H
G
F
E E
D
C
B B
A
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Figure 9.5. A non-confusable
subset of inputs for the noisy
typewriter.
1
0
0 1
11
01
10
00
0
0
1
0
0
1
1
1
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
0
0
0
0
1
0
0
0
0
1
0
0
1
1
0
0
0
0
1
0
1
0
1
0
0
1
1
0
1
1
1
0
0
0
0
1
1
0
0
1
0
1
0
1
1
1
0
1
0
0
1
1
1
0
1
1
0
1
1
1
1
1
1
1
N = 1 N = 2 N = 4
Figure 9.6. Extended channels
obtained from a binary symmetric
channel with transition
probability 0.15.
How does this translate into the terms of the theorem? The following table
explains.
The theorem How it applies to the noisy typewriter
Associated with each discrete
memoryless channel, there is a
non-negative number C.
The capacity C is log2 9.
For any ! > 0 and R < C, for large
enough N ,
No matter what ! and R are, we set the blocklength N to 1.
there exists a block code of length N and
rate ! R
The block code is {B, E, . . . , Z}. The value of K is given by
2K = 9, so K = log2 9, and this code has rate log2 9, which is
greater than the requested value of R.
and a decoding algorithm, The decoding algorithm maps the received letter to the nearest
letter in the code;
such that the maximal probability of
block error is < !.
the maximal probability of block error is zero, which is less
than the given !.
9.7 Intuitive preview of proof
Extended channels
To prove the theorem for any given channel, we consider the extended channel
corresponding to N uses of the channel. The extended channel has |AX |N
possible inputs x and |AY |N possible outputs. Extended channels obtained
from a binary symmetric channel and from a Z channel are shown in figures
9.6 and 9.7, with N = 2 and N = 4.
Since |C| = 9 we have K = log2 9 so K/N = log2 9 ≥ R for any R < C,
and C has zero error so pBM = 0 < �
22 / 28
Noisy Channel Coding Theorem
NCCT
Any rate R < C is achievable for Q (i.e., for any tolerance � > 0, an (N,K )
code with rate K/N ≥ R exists with max. block error pBM < �)
Consider a simple example:
For noisy typewriter Q:
The capacity is C = log2 9
For any � > 0 and R < C
we can choose N = 1 . . .
. . . and code messages
using C = {B, E, . . . , Z}
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9.7: Intuitive preview of proof 153
!Z
-
Z
Y
""
"#
$$$%
...
!""
"#
$$$%
!""
"#
$$$%
!""
"#
$$$%
I
H H
G
F
E E
D
C
B B
A
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Figure 9.5. A non-confusable
subset of inputs for the noisy
typewriter.
1
0
0 1
11
01
10
00
0
0
1
0
0
1
1
1
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
0
0
0
0
1
0
0
0
0
1
0
0
1
1
0
0
0
0
1
0
1
0
1
0
0
1
1
0
1
1
1
0
0
0
0
1
1
0
0
1
0
1
0
1
1
1
0
1
0
0
1
1
1
0
1
1
0
1
1
1
1
1
1
1
N = 1 N = 2 N = 4
Figure 9.6. Extended channels
obtained from a binary symmetric
channel with transition
probability 0.15.
How does this translate into the terms of the theorem? The following table
explains.
The theorem How it applies to the noisy typewriter
Associated with each discrete
memoryless channel, there is a
non-negative number C.
The capacity C is log2 9.
For any ! > 0 and R < C, for large
enough N ,
No matter what ! and R are, we set the blocklength N to 1.
there exists a block code of length N and
rate ! R
The block code is {B, E, . . . , Z}. The value of K is given by
2K = 9, so K = log2 9, and this code has rate log2 9, which is
greater than the requested value of R.
and a decoding algorithm, The decoding algorithm maps the received letter to the nearest
letter in the code;
such that the maximal probability of
block error is < !.
the maximal probability of block error is zero, which is less
than the given !.
9.7 Intuitive preview of proof
Extended channels
To prove the theorem for any given channel, we consider the extended channel
corresponding to N uses of the channel. The extended channel has |AX |N
possible inputs x and |AY |N possible outputs. Extended channels obtained
from a binary symmetric channel and from a Z channel are shown in figures
9.6 and 9.7, with N = 2 and N = 4.
Since |C| = 9 we have K = log2 9 so K/N = log2 9 ≥ R for any R < C,
and C has zero error so pBM = 0 < �
22 / 28
1 Block Codes
2 The Noisy-Channel Coding Theorem
3 Extended Channels
23 / 28
Noisy Channel Coding Theorem: How Is This Possible?
The main “trick” to minimising pBM is to construct a (N,K ) block code with
(almost) non-confusable codes
A code such that the set of y that each x(s) are sent to by Q have low
probability intersection
This is possible because extended channels look like the noisy typewriter!
24 / 28
Noisy Channel Coding Theorem: How Is This Possible?
The main “trick” to minimising pBM is to construct a (N,K ) block code with
(almost) non-confusable codes
A code such that the set of y that each x(s) are sent to by Q have low
probability intersection
This is possible because extended channels look like the noisy typewriter!
24 / 28
Extended Channels
When used N times, a channel Q from X to Y can be seen as an
extended channel taking “symbols” from XN to “symbols” in YN .
Extended Channel
The N th extended channel of Q from X to Y is a channel from XN to YN
with transition probability from x ∈ XN to y ∈ YN given by
P(y|x) =
N∏
n=1
P(yi |xi)
Example: BSC Q with f = 0.1 from X = {0, 1} to Y = {0, 1} has N = 2
extended channel from X 2 = {00, 01, 10, 11} to Y2 = {00, 01, 10, 11}
with
Q2 =
0.81 0.09 0.09 0.01
0.09 0.81 0.01 0.09
0.09 0.01 0.81 0.09
0.01 0.09 0.09 0.81
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
25 / 28
Extended Channels
When used N times, a channel Q from X to Y can be seen as an
extended channel taking “symbols” from XN to “symbols” in YN .
Extended Channel
The N th extended channel of Q from X to Y is a channel from XN to YN
with transition probability from x ∈ XN to y ∈ YN given by
P(y|x) =
N∏
n=1
P(yi |xi)
Example: BSC Q with f = 0.1 from X = {0, 1} to Y = {0, 1} has N = 2
extended channel from X 2 = {00, 01, 10, 11} to Y2 = {00, 01, 10, 11}
with
Q2 =
0.81 0.09 0.09 0.01
0.09 0.81 0.01 0.09
0.09 0.01 0.81 0.09
0.01 0.09 0.09 0.81
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
25 / 28
Extended Channels
When used N times, a channel Q from X to Y can be seen as an
extended channel taking “symbols” from XN to “symbols” in YN .
Extended Channel
The N th extended channel of Q from X to Y is a channel from XN to YN
with transition probability from x ∈ XN to y ∈ YN given by
P(y|x) =
N∏
n=1
P(yi |xi)
Example: BSC Q with f = 0.1 from X = {0, 1} to Y = {0, 1} has N = 2
extended channel from X 2 = {00, 01, 10, 11} to Y2 = {00, 01, 10, 11}
with
Q2 =
0.81 0.09 0.09 0.01
0.09 0.81 0.01 0.09
0.09 0.01 0.81 0.09
0.01 0.09 0.09 0.81
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
25 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
Extended Z Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
Extended Binary Symmetric Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
Extended Binary Symmetric Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
Extended Binary Symmetric Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
Extended Z Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
Extended Z Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
As N increases, any extended channel looks like the noisy typewriter!
1
0
0 1
11
01
10
00
00 10 01 11
1111
0111
1011
0011
1101
0101
1001
0001
1110
0110
1010
0010
1100
0100
1000
0000
00
00
10
00
01
00
11
00
00
10
10
10
01
10
11
10
00
01
10
01
01
01
11
01
00
11
10
11
01
11
11
11
N = 1 N = 2 N = 4
Extended Z Channel
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
148 9 — Communication over a Noisy Channel
Some useful model channels are:
Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.
x
!
!
""#$$%
1
0
1
0
y P (y =0 |x=0) = 1 ! f ;
P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.
x
!
!
""#
$$%
1
0
1
0
? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
?
0
0 1
Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.
!
!
!
&&
&'
((()
-
Z
Y
-
Z
Y
&&
&'
((()
!&&
&'
...((()
&&
&'!&
&&'
((()
!""
"#
$$$%
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!&&
&'
((()
!((()
*
*
*
*
*
*
*
*
*
*
*
*+,
,
,
,
,
,
,
,
,
,
,
,-
H H
G G
F F
E E
D D
C C
B B
A A
...
P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;
...
-
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -
Z channel. AX ={0, 1}. AY ={0, 1}.
x
!
!
""#
1
0
1
0
y P (y =0 |x=0) = 1;
P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1
0
0 1
9.4 Inferring the input given the output
If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:
P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:
P (x | y) = P (y |x)P (x)
P (y)
=
P (y |x)P (x)!
x! P (y |x!)P (x!)
. (9.4)
Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.
P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)
=
0.85 " 0.1
0.85 " 0.1 + 0.15 " 0.9
=
0.085
0.22
= 0.39. (9.5)
Noisy Typewriter Channel
26 / 28
Extended Channels and the Noisy Typewriter
Why does this happen?
Remember that as N gets larger, sequences x = x1x2 . . . xN start looking
typical
For a given x, the corresponding p(y | x) will also be concentrated on a
few sequences
Formalising this will require a notion of joint typicality
27 / 28
Summary and Reading
Main Points
The Noisy Typewriter
Extended Channels
Block Codes
The Noisy-Channel Coding Theorem (Statement only)
Reading
MacKay §9.6
Cover & Thomas §7.5
Next time: Detail of the NCCT, joint typicality, and a sketch of the proof!
28 / 28
Summary and Reading
Main Points
The Noisy Typewriter
Extended Channels
Block Codes
The Noisy-Channel Coding Theorem (Statement only)
Reading
MacKay §9.6
Cover & Thomas §7.5
Next time: Detail of the NCCT, joint typicality, and a sketch of the proof!
28 / 28
Block Codes
The Noisy-Channel Coding Theorem
Extended Channels