COMP2610/6261 – Information Theory – Lecture 18: Channel Capacity
COMP2610/6261 – Information Theory
Lecture 18: Channel Capacity
RObert C. Williamson
Research School of Computer Science
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
8 October, 2018
1 / 28
1 Channel Capacity
2 Computing Capacities
3 Summary
2 / 28
Channels: Recap
“Hi”% “Hi”%
Source : Aditya
Encoder : Telephone handset
Channel : Analogue telephone line
Decoder : Telephone handset
Destination : Mark
3 / 28
Channels: Recap
A discrete channel Q consists of:
an input alphabet X = {a1, . . . , aI}
an output alphabet Y = {b1, . . . , bJ}
transition probabilities P(y |x).
The channel Q can be expressed as a matrix
Qj,i = P(y = bj |x = ai)
This represents the probability of observing bj given that we transmit ai
4 / 28
The Binary Noiseless Channel
One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.
0 0
1 1
X Y
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 0
0 1
]
5 / 28
The Binary Symmetric Channel
Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)
0 0
1 1
X Y
1-f
1-f
f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f
Q =
[
1− f f
f 1− f
]
6 / 28
The Z Channel
Symbols may be corrupted over the channel asymmetrically.
0 0
1 1
X Y
1
1-f
f
Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities
Q =
[
1 f
0 1− f
]
7 / 28
Communicating over Noisy Channels
Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.
QEncoder
x ysin sout
Decoder
Reliability is measured via probability of error — that is, the probability of
incorrectly decoding sout given sin as input:
P(sout 6= sin) =
∑
s
P(sout 6= sin|sin = s)P(sin = s)
8 / 28
1 Channel Capacity
2 Computing Capacities
3 Summary
9 / 28
Mutual Information for a Channel
A key quantity when using a channel is the mutual information between its
inputs X and outputs Y :
I(X ;Y ) = H(X )− H(X |Y ) = H(Y )− H(Y |X )
This measures how much what was received reduces uncertainty about
what was transmitted
This requires we specify some particular p(X )
A channel is only specified by its transition matrix!
10 / 28
Mutual Information for a Channel: Example
For noiseless channel H(X |Y ) = 0 so I(X ;Y ) = H(X ).
If pX = (0.9, 0.1) then I(X ;Y ) = 0.47 bits.
11 / 28
Mutual Information for a Channel: Example
For binary symmetric channel with f = 0.15 and pX = (0.9, 0.1) we have
p(Y = 1) = p(Y = 1 | X = 1) · p(X = 1) + p(Y = 1 | X = 0) · p(X = 0)
= (1− f ) · 0.1 + f · 0.9
= 0.085 + 0.135
= 0.22,
and so H(Y ) = 0.76
Further, H(Y | X = 0) = H(Y | X = 1) = H(0.15) = 0.61.
So, I(X ;Y ) = 0.15 bits
12 / 28
Mutual Information for a Channel: Example
For Z channel with f = 0.15 and same pX we have H(Y ) = 0.42,
H(Y |X ) = 0.061 so I(X ;Y ) = 0.36 bits
So, intuitively, the reliability is “noiseless > Z > symmetric”
13 / 28
Channel Capacity
The mutual information measure for a channel depends on the choice of
input distribution pX . If H(X ) is small then I(X ;Y ) ≤ H(X ) is small.
The largest possible reduction in uncertainty achievable across a channel
is its capacity.
Channel Capacity
The capacity C of a channel Q is the largest mutual information between
its input and output for any choice of input ensemble. That is,
C = max
pX
I(X ;Y )
Later, we will see that the capacity determines the rate at which we can
communicate across a channel with arbitrarily small error.
14 / 28
1 Channel Capacity
2 Computing Capacities
3 Summary
15 / 28
Computing Capacities
Definition of capacity for a channel Q with inputs AX and ouputs AY :
C = max
pX
I(X ;Y )
How do we actually calculate this quantity?
1 Compute the mutual information I(X ;Y ) for a general pX
2 Determine which choice of pX maximises I(X ;Y )
3 Use that maximising value to determine C
Binary Symmetric Channel:
We first consider the binary symmetric channel with AX = AY = {0, 1}
and flip probability f . It has transition matrix
Q =
[
1− f f
f 1− f
]
16 / 28
Computing Capacities
Binary Symmetric Channel – Step 1
The mutual information can be expressed as I(X ;Y ) = H(Y )− H(Y |X ).
We therefore need to compute two terms: H(Y ) and H(Y |X ) so we need
the distributions P(y) and P(y |x).
Computing H(Y ):
P(y = 0) = (1− f ) · P(x = 0) + f · P(x = 1) = (1− f ) · p0 + f · p1
P(y = 1) = (1− f ) · P(x = 1) + f · P(x = 0) = f · p0 + (1− f ) · p1
In general, q := pY = QpX , so above calculation is just
q = pY =
[
1− f f
f 1− f
] [
p0
p1
]
Using H2(q) = −q log2 q − (1− q) log2(1− q) and letting
q = q1 = P(y = 1) we see the entropy
H(Y ) = H2(q1) = H2(f · p0 + (1− f ) · p1)
17 / 28
Computing Capacities
Binary Symmetric Channel – Step 1
Computing H(Y |X ):
Since P(y |x) is described by the matrix Q, we have
H(Y |x = 0) = H2(P(y = 1|x = 0)) = H2(Q1,0) = H2(f )
and similarly,
H(Y |x = 1) = H2(P(y = 1|x = 1)) = H2(Q0,1) = H2(f )
So,
H(Y |X ) =
∑
x
H(Y |x)P(x) =
∑
x
H2(f )P(x) = H2(f )
∑
x
P(x) = H2(f )
Computing I(X ;Y ):
Putting it all together gives
I(X ;Y ) = H(Y )− H(Y |X ) = H2(f · p0 + (1− f ) · p1)− H2(f )
18 / 28
Computing Capacities
Binary Symmetric Channel – Steps 2 and 3
Binary Symmetric Channel (BSC) with flip probability f ∈ [0, 1]:
I(X ;Y ) = H2(fp0 + (1− f )p1)− H2(f )
Examples:
BSC (f = 0) and pX = (0.5, 0.5):
I(X ;Y ) = H2(0.5)− H2(0) = 1
BSC (f = 0.15) and pX = (0.5, 0.5):
I(X ;Y ) = H2(0.5)− H2(0.15) ≈ 0.39
BSC (f = 0.15) and pX = (0.9, 0.1):
I(X ;Y ) = H2(0.22)− H2(0.15) ≈ 0.15
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9.6: The noisy-channel coding theorem 151
How much better can we do? By symmetry, the optimal input distribu-
tion is {0.5, 0.5} and the capacity is
I(X ; Y )
0
0.1
0.2
0.3
0.4
0 0.25 0.5 0.75 1
p1
Figure 9.2. The mutual
information I(X ; Y ) for a binary
symmetric channel with f = 0.15
as a function of the input
distribution.
C(QBSC) = H2(0.5) ! H2(0.15) = 1.0 ! 0.61 = 0.39 bits. (9.11)
We’ll justify the symmetry argument later. If there’s any doubt about
the symmetry argument, we can always resort to explicit maximization
of the mutual information I(X;Y ),
I(X;Y ) = H2((1!f)p1 + (1!p1)f) ! H2(f) (figure 9.2). (9.12)
Example 9.10. The noisy typewriter. The optimal input distribution is a uni-
form distribution over x, and gives C = log2 9 bits.
Example 9.11. Consider the Z channel with f =0.15. Identifying the optimal
input distribution is not so straightforward. We evaluate I(X;Y ) explic-
itly for PX = {p0, p1}. First, we need to compute P (y). The probability
of y =1 is easiest to write down:
P (y =1) = p1(1 ! f). (9.13)
Then the mutual information is:
I(X ; Y )
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.25 0.5 0.75 1
p1
Figure 9.3. The mutual
information I(X ; Y ) for a Z
channel with f = 0.15 as a
function of the input distribution.
I(X;Y ) = H(Y ) ! H(Y |X)
= H2(p1(1 ! f)) ! (p0H2(0) + p1H2(f))
= H2(p1(1 ! f)) ! p1H2(f). (9.14)
This is a non-trivial function of p1, shown in figure 9.3. It is maximized
for f = 0.15 by p!1 = 0.445. We find C(QZ) = 0.685. Notice the optimal
input distribution is not {0.5, 0.5}. We can communicate slightly more
information by using input symbol 0 more frequently than 1.
Exercise 9.12.[1, p.158] What is the capacity of the binary symmetric channel
for general f?
Exercise 9.13.[2, p.158] Show that the capacity of the binary erasure channel
with f = 0.15 is CBEC = 0.85. What is its capacity for general f?
Comment.
9.6 The noisy-channel coding theorem
It seems plausible that the ‘capacity’ we have defined may be a measure of
information conveyed by a channel; what is not obvious, and what we will
prove in the next chapter, is that the capacity indeed measures the rate at
which blocks of data can be communicated over the channel with arbitrarily
small probability of error.
We make the following definitions.
An (N,K) block code for a channel Q is a list of S = 2K codewords
{x(1),x(2), . . . ,x(2K )}, x(s) ” ANX ,
each of length N . Using this code we can encode a signal s ”
{1, 2, 3, . . . , 2K} as x(s). [The number of codewords S is an integer,
but the number of bits specified by choosing a codeword, K # log2 S, is
not necessarily an integer.]
I(X ;Y ), f = 0.15
Maximise I(X ;Y ): Since I(X ;Y ) is symmetric in p1 it is maximised when
p0 = p1 = 0.5 in which case C = 0.39 for BSC with f = 0.15.
19 / 28
Channel Capacity: Example
For a binary symmetric channel, we could also argue
I(X ;Y ) = H(Y )− H(Y | X )
= H(Y )−
∑
x
p(X = x) · H(Y | X = x)
= H(Y )−
∑
x
p(X = x) · H(f )
= H(Y )− H(f )
≤ 1− H(f ),
where equality of the last line holds for uniform pX
20 / 28
Symmetric Channels
The BSC was easy to work with due to considerable symmetry
Symmetric Channel
A channel with input AX and outputs AY and matrix Q is symmetric if AY
can be partitioned into subsets Y ′ ⊆ Y so that each sub-matrix Q′
containing only rows for outputs Y ′ has:
Columns that are all permutations of each other
Rows that are all permutations of each other
21 / 28
Symmetric Channels: Examples
AX = AY = {0, 1}
Q =
[
0.9 0.1
0.1 0.9
]
Symmetric
Subsets: {0, 1}
AX = {0, 1}, AY = {0, ?, 1}
Q =
0.7 0.10.2 0.2
0.1 0.7
Symmetric
Subsets: {0, 1}, {?}
AX = AY = {0, 1}
Q =
[
0.9 0
0.1 1
]
Not Symmetric
If one of our partitions has just one row, then every element in that must be
equal for the columns to be permutations of each other
Simplest case: all rows and columns are permutations of each other
But this is not a requirement
22 / 28
Channel Capacity for Symmetric Channels
For symmetric channels, the optimal distribution for the capacity has a
simple form:
Theorem
If Q is symmetric, then its capacity is achieved by a uniform distribution
over X .
Exercise 10.10 in MacKay
23 / 28
Computing Capacities in General
What can we do if the channel is not symmetric?
We can still calculate I(X ;Y ) for a general input distribution pX
Finding the maximising pX is more challenging
What to do once we know I(X ;Y )?
I(X ;Y ) is concave in pX =⇒ single maximum
For binary inputs, just look for stationary points (not for |AX | > 2) i.e.,
where ddp I(X ;Y ) = 0 for pX = (1− p, p)
In general, need to consider distributions that place 0 probability on
one of the inputs
24 / 28
Computing Capacities in General
Example (Z Channel with P(y = 0|x = 1) = f ):
H(Y ) = H2(P(y = 1)) = H2(0p0 + (1− f )p1)
= H2((1− f )p1)
H(Y |X ) = p0H2(P(y = 1|x = 0)) + p1H2(P(y = 0|x = 1))
= p0 H2(0)︸ ︷︷ ︸
=0
+p1H2(f )
I(X ;Y ) = H2((1− f )p1)− p1H2(f )
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
9.6: The noisy-channel coding theorem 151
How much better can we do? By symmetry, the optimal input distribu-
tion is {0.5, 0.5} and the capacity is
I(X ; Y )
0
0.1
0.2
0.3
0.4
0 0.25 0.5 0.75 1
p1
Figure 9.2. The mutual
information I(X ; Y ) for a binary
symmetric channel with f = 0.15
as a function of the input
distribution.
C(QBSC) = H2(0.5) ! H2(0.15) = 1.0 ! 0.61 = 0.39 bits. (9.11)
We’ll justify the symmetry argument later. If there’s any doubt about
the symmetry argument, we can always resort to explicit maximization
of the mutual information I(X;Y ),
I(X;Y ) = H2((1!f)p1 + (1!p1)f) ! H2(f) (figure 9.2). (9.12)
Example 9.10. The noisy typewriter. The optimal input distribution is a uni-
form distribution over x, and gives C = log2 9 bits.
Example 9.11. Consider the Z channel with f =0.15. Identifying the optimal
input distribution is not so straightforward. We evaluate I(X;Y ) explic-
itly for PX = {p0, p1}. First, we need to compute P (y). The probability
of y =1 is easiest to write down:
P (y =1) = p1(1 ! f). (9.13)
Then the mutual information is:
I(X ; Y )
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.25 0.5 0.75 1
p1
Figure 9.3. The mutual
information I(X ; Y ) for a Z
channel with f = 0.15 as a
function of the input distribution.
I(X;Y ) = H(Y ) ! H(Y |X)
= H2(p1(1 ! f)) ! (p0H2(0) + p1H2(f))
= H2(p1(1 ! f)) ! p1H2(f). (9.14)
This is a non-trivial function of p1, shown in figure 9.3. It is maximized
for f = 0.15 by p!1 = 0.445. We find C(QZ) = 0.685. Notice the optimal
input distribution is not {0.5, 0.5}. We can communicate slightly more
information by using input symbol 0 more frequently than 1.
Exercise 9.12.[1, p.158] What is the capacity of the binary symmetric channel
for general f?
Exercise 9.13.[2, p.158] Show that the capacity of the binary erasure channel
with f = 0.15 is CBEC = 0.85. What is its capacity for general f?
Comment.
9.6 The noisy-channel coding theorem
It seems plausible that the ‘capacity’ we have defined may be a measure of
information conveyed by a channel; what is not obvious, and what we will
prove in the next chapter, is that the capacity indeed measures the rate at
which blocks of data can be communicated over the channel with arbitrarily
small probability of error.
We make the following definitions.
An (N,K) block code for a channel Q is a list of S = 2K codewords
{x(1),x(2), . . . ,x(2K )}, x(s) ” ANX ,
each of length N . Using this code we can encode a signal s ”
{1, 2, 3, . . . , 2K} as x(s). [The number of codewords S is an integer,
but the number of bits specified by choosing a codeword, K # log2 S, is
not necessarily an integer.]
I(X ; Y ) for Z channel with f = 0.15
25 / 28
Computing Capacities in General
Example (Z Channel):
Showed earlier that I(X ;Y ) = H2((1− f )p)− pH2(f ) so solve
d
dp
I(X ;Y ) = 0 ⇐⇒ (1− f ) log2
(
1− (1− f )p
(1− f )p
)
− H2(f ) = 0
⇐⇒ 1− (1− f )p
(1− f )p = 2
H2(f )/(1−f )
⇐⇒ p = 1/(1− f )
1 + 2H2(f )/(1−f )
For f = 0.15, we get p = 1/0.85
1+20.61/0.85
≈ 0.44 and so
C = H2(0.38)− 0.44H2(0.15) ≈ 0.685
Homework: Show that ddp H2(p) = log2
1−p
p
26 / 28
Why Do We Care?
We have a template for computing channel capacity for generic channels
But what does this tell us?
How, if at all, does it relate to the error probability when decoding?
What, if anything, does it tell us about the amount of redundancy we
can get away with when encoding?
We will see next time that there is a deep connection between the capacity
and the best achievable rate of transmission
Rates above the capacity cannot be achieved while ensuring
arbitrarily small error probabilities
27 / 28
Summary and Conclusions
Mutual information between input and output should be large
Depends on input distribution
Capacity is the maximal possible mutual information
Can compute easily for symmetric channels
Can compute explicitly for generic channels
28 / 28
Channel Capacity
Computing Capacities
Summary