程序代写代做代考 information theory AI COMP2610/6261 – Information Theory – Lecture 18: Channel Capacity

COMP2610/6261 – Information Theory – Lecture 18: Channel Capacity

COMP2610/6261 – Information Theory
Lecture 18: Channel Capacity

RObert C. Williamson

Research School of Computer Science

1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y

ANU Logo Use Guidelines

Deep Gold
C30 M50 Y70 K40

PMS Metallic 8620

PMS 463

Black
C0 M0 Y0 K100

PMS Process Black

Preferred logo Black version

Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact

brand@anu.edu.au

The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:

First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.

Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.

Black
Where colour printing is not available, the black logo can
be used on a white background.

Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.

8 October, 2018

1 / 28

1 Channel Capacity

2 Computing Capacities

3 Summary

2 / 28

Channels: Recap

“Hi”% “Hi”%

Source : Aditya

Encoder : Telephone handset

Channel : Analogue telephone line

Decoder : Telephone handset

Destination : Mark

3 / 28

Channels: Recap

A discrete channel Q consists of:

an input alphabet X = {a1, . . . , aI}

an output alphabet Y = {b1, . . . , bJ}

transition probabilities P(y |x).

The channel Q can be expressed as a matrix

Qj,i = P(y = bj |x = ai)

This represents the probability of observing bj given that we transmit ai

4 / 28

The Binary Noiseless Channel

One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.

0 0

1 1

X Y

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 0
0 1

]

5 / 28

The Binary Symmetric Channel

Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)

0 0

1 1

X Y

1-f

1-f

f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f

Q =
[
1− f f

f 1− f

]

6 / 28

The Z Channel

Symbols may be corrupted over the channel asymmetrically.

0 0

1 1

X Y

1

1-f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 f
0 1− f

]

7 / 28

Communicating over Noisy Channels

Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.

QEncoder
x ysin sout

Decoder

Reliability is measured via probability of error — that is, the probability of
incorrectly decoding sout given sin as input:

P(sout 6= sin) =

s

P(sout 6= sin|sin = s)P(sin = s)

8 / 28

1 Channel Capacity

2 Computing Capacities

3 Summary

9 / 28

Mutual Information for a Channel

A key quantity when using a channel is the mutual information between its
inputs X and outputs Y :

I(X ;Y ) = H(X )− H(X |Y ) = H(Y )− H(Y |X )

This measures how much what was received reduces uncertainty about
what was transmitted

This requires we specify some particular p(X )

A channel is only specified by its transition matrix!

10 / 28

Mutual Information for a Channel: Example

For noiseless channel H(X |Y ) = 0 so I(X ;Y ) = H(X ).

If pX = (0.9, 0.1) then I(X ;Y ) = 0.47 bits.

11 / 28

Mutual Information for a Channel: Example

For binary symmetric channel with f = 0.15 and pX = (0.9, 0.1) we have

p(Y = 1) = p(Y = 1 | X = 1) · p(X = 1) + p(Y = 1 | X = 0) · p(X = 0)
= (1− f ) · 0.1 + f · 0.9
= 0.085 + 0.135

= 0.22,

and so H(Y ) = 0.76

Further, H(Y | X = 0) = H(Y | X = 1) = H(0.15) = 0.61.

So, I(X ;Y ) = 0.15 bits

12 / 28

Mutual Information for a Channel: Example

For Z channel with f = 0.15 and same pX we have H(Y ) = 0.42,
H(Y |X ) = 0.061 so I(X ;Y ) = 0.36 bits

So, intuitively, the reliability is “noiseless > Z > symmetric”

13 / 28

Channel Capacity

The mutual information measure for a channel depends on the choice of
input distribution pX . If H(X ) is small then I(X ;Y ) ≤ H(X ) is small.

The largest possible reduction in uncertainty achievable across a channel
is its capacity.

Channel Capacity
The capacity C of a channel Q is the largest mutual information between
its input and output for any choice of input ensemble. That is,

C = max
pX

I(X ;Y )

Later, we will see that the capacity determines the rate at which we can
communicate across a channel with arbitrarily small error.

14 / 28

1 Channel Capacity

2 Computing Capacities

3 Summary

15 / 28

Computing Capacities

Definition of capacity for a channel Q with inputs AX and ouputs AY :

C = max
pX

I(X ;Y )

How do we actually calculate this quantity?

1 Compute the mutual information I(X ;Y ) for a general pX
2 Determine which choice of pX maximises I(X ;Y )
3 Use that maximising value to determine C

Binary Symmetric Channel:
We first consider the binary symmetric channel with AX = AY = {0, 1}
and flip probability f . It has transition matrix

Q =
[
1− f f

f 1− f

]
16 / 28

Computing Capacities
Binary Symmetric Channel – Step 1

The mutual information can be expressed as I(X ;Y ) = H(Y )− H(Y |X ).
We therefore need to compute two terms: H(Y ) and H(Y |X ) so we need
the distributions P(y) and P(y |x).

Computing H(Y ):

P(y = 0) = (1− f ) · P(x = 0) + f · P(x = 1) = (1− f ) · p0 + f · p1
P(y = 1) = (1− f ) · P(x = 1) + f · P(x = 0) = f · p0 + (1− f ) · p1

In general, q := pY = QpX , so above calculation is just

q = pY =
[
1− f f

f 1− f

] [
p0
p1

]
Using H2(q) = −q log2 q − (1− q) log2(1− q) and letting
q = q1 = P(y = 1) we see the entropy

H(Y ) = H2(q1) = H2(f · p0 + (1− f ) · p1)
17 / 28

Computing Capacities
Binary Symmetric Channel – Step 1

Computing H(Y |X ):
Since P(y |x) is described by the matrix Q, we have
H(Y |x = 0) = H2(P(y = 1|x = 0)) = H2(Q1,0) = H2(f )
and similarly,
H(Y |x = 1) = H2(P(y = 1|x = 1)) = H2(Q0,1) = H2(f )
So,

H(Y |X ) =

x

H(Y |x)P(x) =

x

H2(f )P(x) = H2(f )

x

P(x) = H2(f )

Computing I(X ;Y ):
Putting it all together gives

I(X ;Y ) = H(Y )− H(Y |X ) = H2(f · p0 + (1− f ) · p1)− H2(f )

18 / 28

Computing Capacities
Binary Symmetric Channel – Steps 2 and 3

Binary Symmetric Channel (BSC) with flip probability f ∈ [0, 1]:

I(X ;Y ) = H2(fp0 + (1− f )p1)− H2(f )

Examples:

BSC (f = 0) and pX = (0.5, 0.5):
I(X ;Y ) = H2(0.5)− H2(0) = 1
BSC (f = 0.15) and pX = (0.5, 0.5):
I(X ;Y ) = H2(0.5)− H2(0.15) ≈ 0.39
BSC (f = 0.15) and pX = (0.9, 0.1):
I(X ;Y ) = H2(0.22)− H2(0.15) ≈ 0.15

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

9.6: The noisy-channel coding theorem 151

How much better can we do? By symmetry, the optimal input distribu-
tion is {0.5, 0.5} and the capacity is

I(X ; Y )

0

0.1

0.2

0.3

0.4

0 0.25 0.5 0.75 1
p1

Figure 9.2. The mutual
information I(X ; Y ) for a binary
symmetric channel with f = 0.15
as a function of the input
distribution.

C(QBSC) = H2(0.5) ! H2(0.15) = 1.0 ! 0.61 = 0.39 bits. (9.11)

We’ll justify the symmetry argument later. If there’s any doubt about
the symmetry argument, we can always resort to explicit maximization
of the mutual information I(X;Y ),

I(X;Y ) = H2((1!f)p1 + (1!p1)f) ! H2(f) (figure 9.2). (9.12)

Example 9.10. The noisy typewriter. The optimal input distribution is a uni-
form distribution over x, and gives C = log2 9 bits.

Example 9.11. Consider the Z channel with f =0.15. Identifying the optimal
input distribution is not so straightforward. We evaluate I(X;Y ) explic-
itly for PX = {p0, p1}. First, we need to compute P (y). The probability
of y =1 is easiest to write down:

P (y =1) = p1(1 ! f). (9.13)

Then the mutual information is:
I(X ; Y )

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7

0 0.25 0.5 0.75 1
p1

Figure 9.3. The mutual
information I(X ; Y ) for a Z
channel with f = 0.15 as a
function of the input distribution.

I(X;Y ) = H(Y ) ! H(Y |X)
= H2(p1(1 ! f)) ! (p0H2(0) + p1H2(f))
= H2(p1(1 ! f)) ! p1H2(f). (9.14)

This is a non-trivial function of p1, shown in figure 9.3. It is maximized
for f = 0.15 by p!1 = 0.445. We find C(QZ) = 0.685. Notice the optimal
input distribution is not {0.5, 0.5}. We can communicate slightly more
information by using input symbol 0 more frequently than 1.

Exercise 9.12.[1, p.158] What is the capacity of the binary symmetric channel
for general f?

Exercise 9.13.[2, p.158] Show that the capacity of the binary erasure channel
with f = 0.15 is CBEC = 0.85. What is its capacity for general f?
Comment.

9.6 The noisy-channel coding theorem

It seems plausible that the ‘capacity’ we have defined may be a measure of
information conveyed by a channel; what is not obvious, and what we will
prove in the next chapter, is that the capacity indeed measures the rate at
which blocks of data can be communicated over the channel with arbitrarily
small probability of error.

We make the following definitions.

An (N,K) block code for a channel Q is a list of S = 2K codewords

{x(1),x(2), . . . ,x(2K )}, x(s) ” ANX ,

each of length N . Using this code we can encode a signal s ”
{1, 2, 3, . . . , 2K} as x(s). [The number of codewords S is an integer,
but the number of bits specified by choosing a codeword, K # log2 S, is
not necessarily an integer.]

I(X ;Y ), f = 0.15

Maximise I(X ;Y ): Since I(X ;Y ) is symmetric in p1 it is maximised when
p0 = p1 = 0.5 in which case C = 0.39 for BSC with f = 0.15.

19 / 28

Channel Capacity: Example

For a binary symmetric channel, we could also argue

I(X ;Y ) = H(Y )− H(Y | X )
= H(Y )−


x

p(X = x) · H(Y | X = x)

= H(Y )−

x

p(X = x) · H(f )

= H(Y )− H(f )
≤ 1− H(f ),

where equality of the last line holds for uniform pX

20 / 28

Symmetric Channels

The BSC was easy to work with due to considerable symmetry

Symmetric Channel
A channel with input AX and outputs AY and matrix Q is symmetric if AY
can be partitioned into subsets Y ′ ⊆ Y so that each sub-matrix Q′
containing only rows for outputs Y ′ has:

Columns that are all permutations of each other
Rows that are all permutations of each other

21 / 28

Symmetric Channels: Examples

AX = AY = {0, 1}

Q =
[
0.9 0.1
0.1 0.9

]
Symmetric

Subsets: {0, 1}

AX = {0, 1}, AY = {0, ?, 1}

Q =


0.7 0.10.2 0.2

0.1 0.7


Symmetric
Subsets: {0, 1}, {?}

AX = AY = {0, 1}

Q =
[
0.9 0
0.1 1

]
Not Symmetric

If one of our partitions has just one row, then every element in that must be
equal for the columns to be permutations of each other

Simplest case: all rows and columns are permutations of each other

But this is not a requirement

22 / 28

Channel Capacity for Symmetric Channels

For symmetric channels, the optimal distribution for the capacity has a
simple form:

Theorem
If Q is symmetric, then its capacity is achieved by a uniform distribution
over X .

Exercise 10.10 in MacKay

23 / 28

Computing Capacities in General

What can we do if the channel is not symmetric?

We can still calculate I(X ;Y ) for a general input distribution pX
Finding the maximising pX is more challenging

What to do once we know I(X ;Y )?

I(X ;Y ) is concave in pX =⇒ single maximum

For binary inputs, just look for stationary points (not for |AX | > 2) i.e.,
where ddp I(X ;Y ) = 0 for pX = (1− p, p)

In general, need to consider distributions that place 0 probability on
one of the inputs

24 / 28

Computing Capacities in General

Example (Z Channel with P(y = 0|x = 1) = f ):
H(Y ) = H2(P(y = 1)) = H2(0p0 + (1− f )p1)

= H2((1− f )p1)
H(Y |X ) = p0H2(P(y = 1|x = 0)) + p1H2(P(y = 0|x = 1))

= p0 H2(0)︸ ︷︷ ︸
=0

+p1H2(f )

I(X ;Y ) = H2((1− f )p1)− p1H2(f )

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

9.6: The noisy-channel coding theorem 151

How much better can we do? By symmetry, the optimal input distribu-
tion is {0.5, 0.5} and the capacity is

I(X ; Y )

0

0.1

0.2

0.3

0.4

0 0.25 0.5 0.75 1
p1

Figure 9.2. The mutual
information I(X ; Y ) for a binary
symmetric channel with f = 0.15
as a function of the input
distribution.

C(QBSC) = H2(0.5) ! H2(0.15) = 1.0 ! 0.61 = 0.39 bits. (9.11)

We’ll justify the symmetry argument later. If there’s any doubt about
the symmetry argument, we can always resort to explicit maximization
of the mutual information I(X;Y ),

I(X;Y ) = H2((1!f)p1 + (1!p1)f) ! H2(f) (figure 9.2). (9.12)

Example 9.10. The noisy typewriter. The optimal input distribution is a uni-
form distribution over x, and gives C = log2 9 bits.

Example 9.11. Consider the Z channel with f =0.15. Identifying the optimal
input distribution is not so straightforward. We evaluate I(X;Y ) explic-
itly for PX = {p0, p1}. First, we need to compute P (y). The probability
of y =1 is easiest to write down:

P (y =1) = p1(1 ! f). (9.13)

Then the mutual information is:
I(X ; Y )

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7

0 0.25 0.5 0.75 1
p1

Figure 9.3. The mutual
information I(X ; Y ) for a Z
channel with f = 0.15 as a
function of the input distribution.

I(X;Y ) = H(Y ) ! H(Y |X)
= H2(p1(1 ! f)) ! (p0H2(0) + p1H2(f))
= H2(p1(1 ! f)) ! p1H2(f). (9.14)

This is a non-trivial function of p1, shown in figure 9.3. It is maximized
for f = 0.15 by p!1 = 0.445. We find C(QZ) = 0.685. Notice the optimal
input distribution is not {0.5, 0.5}. We can communicate slightly more
information by using input symbol 0 more frequently than 1.

Exercise 9.12.[1, p.158] What is the capacity of the binary symmetric channel
for general f?

Exercise 9.13.[2, p.158] Show that the capacity of the binary erasure channel
with f = 0.15 is CBEC = 0.85. What is its capacity for general f?
Comment.

9.6 The noisy-channel coding theorem

It seems plausible that the ‘capacity’ we have defined may be a measure of
information conveyed by a channel; what is not obvious, and what we will
prove in the next chapter, is that the capacity indeed measures the rate at
which blocks of data can be communicated over the channel with arbitrarily
small probability of error.

We make the following definitions.

An (N,K) block code for a channel Q is a list of S = 2K codewords

{x(1),x(2), . . . ,x(2K )}, x(s) ” ANX ,

each of length N . Using this code we can encode a signal s ”
{1, 2, 3, . . . , 2K} as x(s). [The number of codewords S is an integer,
but the number of bits specified by choosing a codeword, K # log2 S, is
not necessarily an integer.]

I(X ; Y ) for Z channel with f = 0.15

25 / 28

Computing Capacities in General

Example (Z Channel):
Showed earlier that I(X ;Y ) = H2((1− f )p)− pH2(f ) so solve

d
dp

I(X ;Y ) = 0 ⇐⇒ (1− f ) log2
(

1− (1− f )p
(1− f )p

)
− H2(f ) = 0

⇐⇒ 1− (1− f )p
(1− f )p = 2

H2(f )/(1−f )

⇐⇒ p = 1/(1− f )
1 + 2H2(f )/(1−f )

For f = 0.15, we get p = 1/0.85
1+20.61/0.85

≈ 0.44 and so
C = H2(0.38)− 0.44H2(0.15) ≈ 0.685

Homework: Show that ddp H2(p) = log2
1−p

p

26 / 28

Why Do We Care?

We have a template for computing channel capacity for generic channels

But what does this tell us?

How, if at all, does it relate to the error probability when decoding?
What, if anything, does it tell us about the amount of redundancy we
can get away with when encoding?

We will see next time that there is a deep connection between the capacity
and the best achievable rate of transmission

Rates above the capacity cannot be achieved while ensuring
arbitrarily small error probabilities

27 / 28

Summary and Conclusions

Mutual information between input and output should be large

Depends on input distribution

Capacity is the maximal possible mutual information

Can compute easily for symmetric channels

Can compute explicitly for generic channels

28 / 28

Channel Capacity
Computing Capacities
Summary