程序代写代做代考 scheme information theory algorithm AI COMP2610/6261 – Information Theory – Lecture 17: Noisy Channels

COMP2610/6261 – Information Theory – Lecture 17: Noisy Channels

COMP2610/6261 – Information Theory
Lecture 17: Noisy Channels

Bob Williamson

Research School of Computer Science

1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y

ANU Logo Use Guidelines

Deep Gold
C30 M50 Y70 K40

PMS Metallic 8620

PMS 463

Black
C0 M0 Y0 K100

PMS Process Black

Preferred logo Black version

Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact

brand@anu.edu.au

The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:

First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.

Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.

Black
Where colour printing is not available, the black logo can
be used on a white background.

Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.

2 October, 2018

1 / 28

1 Communication over Noisy Channels: Big Picture

2 Noisy Channels: Formally

3 Examples of Channels

4 Probability of Error

2 / 28

The Big Picture

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

9

Communication over a Noisy Channel

9.1 The big picture

Noisy
channel

Encoder Decoder

Compressor Decompressor
Source
coding

Channel
coding

Source


In Chapters 4–6, we discussed source coding with block codes, symbol codes
and stream codes. We implicitly assumed that the channel from the compres-
sor to the decompressor was noise-free. Real channels are noisy. We will now
spend two chapters on the subject of noisy-channel coding – the fundamen-
tal possibilities and limitations of error-free communication through a noisy
channel. The aim of channel coding is to make the noisy channel behave like
a noiseless channel. We will assume that the data to be transmitted has been
through a good compressor, so the bit stream has no obvious redundancy. The
channel code, which makes the transmission, will put back redundancy of a
special sort, designed to make the noisy received signal decodeable.

Suppose we transmit 1000 bits per second with p0 = p1 = 1/2 over a
noisy channel that flips bits with probability f = 0.1. What is the rate of
transmission of information? We might guess that the rate is 900 bits per
second by subtracting the expected number of errors per second. But this is
not correct, because the recipient does not know where the errors occurred.
Consider the case where the noise is so great that the received symbols are
independent of the transmitted symbols. This corresponds to a noise level of
f = 0.5, since half of the received symbols are correct due to chance alone.
But when f = 0.5, no information is transmitted at all.

Given what we have learnt about entropy, it seems reasonable that a mea-
sure of the information transmitted is given by the mutual information between
the source and the received signal, that is, the entropy of the source minus the
conditional entropy of the source given the received signal.

We will now review the definition of conditional entropy and mutual in-
formation. Then we will examine whether it is possible to use such a noisy
channel to communicate reliably. We will show that for any channel Q there
is a non-zero rate, the capacity C(Q), up to which information can be sent

146

3 / 28

The Big Picture

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

9

Communication over a Noisy Channel

9.1 The big picture

Noisy
channel

Encoder Decoder

Compressor Decompressor
Source
coding

Channel
coding

Source


In Chapters 4–6, we discussed source coding with block codes, symbol codes
and stream codes. We implicitly assumed that the channel from the compres-
sor to the decompressor was noise-free. Real channels are noisy. We will now
spend two chapters on the subject of noisy-channel coding – the fundamen-
tal possibilities and limitations of error-free communication through a noisy
channel. The aim of channel coding is to make the noisy channel behave like
a noiseless channel. We will assume that the data to be transmitted has been
through a good compressor, so the bit stream has no obvious redundancy. The
channel code, which makes the transmission, will put back redundancy of a
special sort, designed to make the noisy received signal decodeable.

Suppose we transmit 1000 bits per second with p0 = p1 = 1/2 over a
noisy channel that flips bits with probability f = 0.1. What is the rate of
transmission of information? We might guess that the rate is 900 bits per
second by subtracting the expected number of errors per second. But this is
not correct, because the recipient does not know where the errors occurred.
Consider the case where the noise is so great that the received symbols are
independent of the transmitted symbols. This corresponds to a noise level of
f = 0.5, since half of the received symbols are correct due to chance alone.
But when f = 0.5, no information is transmitted at all.

Given what we have learnt about entropy, it seems reasonable that a mea-
sure of the information transmitted is given by the mutual information between
the source and the received signal, that is, the entropy of the source minus the
conditional entropy of the source given the received signal.

We will now review the definition of conditional entropy and mutual in-
formation. Then we will examine whether it is possible to use such a noisy
channel to communicate reliably. We will show that for any channel Q there
is a non-zero rate, the capacity C(Q), up to which information can be sent

146

Concept: Expected code length

Theorem: Source coding theorem

Algorithms: { Huffman, Arithmetic } codes 3 / 28

The Big Picture

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

9

Communication over a Noisy Channel

9.1 The big picture

Noisy
channel

Encoder Decoder

Compressor Decompressor
Source
coding

Channel
coding

Source


In Chapters 4–6, we discussed source coding with block codes, symbol codes
and stream codes. We implicitly assumed that the channel from the compres-
sor to the decompressor was noise-free. Real channels are noisy. We will now
spend two chapters on the subject of noisy-channel coding – the fundamen-
tal possibilities and limitations of error-free communication through a noisy
channel. The aim of channel coding is to make the noisy channel behave like
a noiseless channel. We will assume that the data to be transmitted has been
through a good compressor, so the bit stream has no obvious redundancy. The
channel code, which makes the transmission, will put back redundancy of a
special sort, designed to make the noisy received signal decodeable.

Suppose we transmit 1000 bits per second with p0 = p1 = 1/2 over a
noisy channel that flips bits with probability f = 0.1. What is the rate of
transmission of information? We might guess that the rate is 900 bits per
second by subtracting the expected number of errors per second. But this is
not correct, because the recipient does not know where the errors occurred.
Consider the case where the noise is so great that the received symbols are
independent of the transmitted symbols. This corresponds to a noise level of
f = 0.5, since half of the received symbols are correct due to chance alone.
But when f = 0.5, no information is transmitted at all.

Given what we have learnt about entropy, it seems reasonable that a mea-
sure of the information transmitted is given by the mutual information between
the source and the received signal, that is, the entropy of the source minus the
conditional entropy of the source given the received signal.

We will now review the definition of conditional entropy and mutual in-
formation. Then we will examine whether it is possible to use such a noisy
channel to communicate reliably. We will show that for any channel Q there
is a non-zero rate, the capacity C(Q), up to which information can be sent

146

Concept: Channel capacity

Theorem: Channel coding theorem

Algorithms: Repetition codes, Hamming codes 3 / 28

Communication over Noisy Channels

A channel is some medium for transmitting messages

A noisy channel is a channel that potentially introduces errors in the
sender’s message

The Problem of Communication
“The fundamental problem of communication is that of reproducing at one
point either exactly or approximately a message selected at another point.”
(Claude Shannon, 1948)

4 / 28

Example: Telephone Network

“Hi”% “Hi”%

Source : Aditya

Encoder : Telephone handset

Channel : Analogue telephone line

Decoder : Telephone handset

Destination : Mark

5 / 28

Key Questions

How do we model noisy communication abstractly?

What are the theoretical limits of noise correction?

What are the practical approaches to noise correction?

6 / 28

1 Communication over Noisy Channels: Big Picture

2 Noisy Channels: Formally

3 Examples of Channels

4 Probability of Error

7 / 28

Encoder/Decoder Pairs

Suppose we have some set S = {1, 2, . . . ,S} of possible messages
e.g. codewords from Huffman coding on some ensemble

Sender and receiver agree on what these are

When communicating over a channel, the sender must encode messages
into some input alphabet X

The receiver then receives some (possibly corrupted) element from an
output alphabet Y

Simple case: X = Y = {0, 1}

The bit the sender transmits may not be what the receiver sees

8 / 28

Encoder/Decoder Pairs

Suppose we have some set S = {1, 2, . . . ,S} of possible messages
e.g. codewords from Huffman coding on some ensemble

Sender and receiver agree on what these are

When communicating over a channel, the sender must encode messages
into some input alphabet X

The receiver then receives some (possibly corrupted) element from an
output alphabet Y

Simple case: X = Y = {0, 1}

The bit the sender transmits may not be what the receiver sees

8 / 28

Encoder/Decoder Pairs

Suppose we have some set S = {1, 2, . . . ,S} of possible messages
e.g. codewords from Huffman coding on some ensemble

Sender and receiver agree on what these are

When communicating over a channel, the sender must encode messages
into some input alphabet X

The receiver then receives some (possibly corrupted) element from an
output alphabet Y

Simple case: X = Y = {0, 1}

The bit the sender transmits may not be what the receiver sees

8 / 28

Encoder/Decoder Pairs

Formally, the sender encodes messages via

enc : S → XN

The receiver then decodes messages via

dec : YN → S

Isn’t the compressor already “encoding” a message?
Yes, but we might want to add something for noise tolerance

Might have X 6= Y
e.g. if we allow a special “erased” symbol

N > 1 can be thought of as multiple uses of a channel
e.g. use a bitstring of length 4 to represent messages

9 / 28

Encoder/Decoder Pairs

Formally, the sender encodes messages via

enc : S → XN

The receiver then decodes messages via

dec : YN → S

Isn’t the compressor already “encoding” a message?
Yes, but we might want to add something for noise tolerance

Might have X 6= Y
e.g. if we allow a special “erased” symbol

N > 1 can be thought of as multiple uses of a channel
e.g. use a bitstring of length 4 to represent messages

9 / 28

Encoder/Decoder Pairs

Formally, the sender encodes messages via

enc : S → XN

The receiver then decodes messages via

dec : YN → S

Isn’t the compressor already “encoding” a message?
Yes, but we might want to add something for noise tolerance

Might have X 6= Y
e.g. if we allow a special “erased” symbol

N > 1 can be thought of as multiple uses of a channel
e.g. use a bitstring of length 4 to represent messages

9 / 28

Channels: Informally

A discrete channel Q will:

accept an input from X
produce an output from Y

There is a probability of observing various outputs, given an input

This represents some inherent noise
Noise could depend on the input

10 / 28

Channels: Informally

A discrete channel Q will:

accept an input from X
produce an output from Y

There is a probability of observing various outputs, given an input

This represents some inherent noise
Noise could depend on the input

10 / 28

Channels: Formally

A discrete channel Q consists of:

an input alphabet X = {a1, . . . , aI}

an output alphabet Y = {b1, . . . , bJ}

transition probabilities P(y |x).

The channel Q can be expressed as a matrix

Qj,i = P(y = bj |x = ai)

This represents the probability of observing bj given that we transmit ai

11 / 28

Channels: Formally

A discrete channel Q consists of:

an input alphabet X = {a1, . . . , aI}

an output alphabet Y = {b1, . . . , bJ}

transition probabilities P(y |x).

The channel Q can be expressed as a matrix

Qj,i = P(y = bj |x = ai)

This represents the probability of observing bj given that we transmit ai

11 / 28

Channels: Example

Example: A channel Q with inputs X = {a1, a2, a3}, outputs
Y = {b1, b2}, and transition probabilities expressed by the matrix

Q =
[
0.8 0.5 0.2
0.2 0.5 0.8

]

So P(b1|a1) = 0.8 = P(b2|a3) and P(b1|a2) = P(b2|a2) = 0.5.

We arrange the inputs along the columns, and outputs along the rows

Actual details of alphabet are abstracted away

12 / 28

Channels: Example

Example: A channel Q with inputs X = {a1, a2, a3}, outputs
Y = {b1, b2}, and transition probabilities expressed by the matrix

Q =
[
0.8 0.5 0.2
0.2 0.5 0.8

]

So P(b1|a1) = 0.8 = P(b2|a3) and P(b1|a2) = P(b2|a2) = 0.5.

We arrange the inputs along the columns, and outputs along the rows

Actual details of alphabet are abstracted away

12 / 28

1 Communication over Noisy Channels: Big Picture

2 Noisy Channels: Formally

3 Examples of Channels

4 Probability of Error

13 / 28

The Binary Noiseless Channel

One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.

0 0

1 1

X Y

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 0
0 1

]

What was transmitted over the channel if 0000 1111 was received?

0000 1111

Q−→ 0000 1111

14 / 28

The Binary Noiseless Channel

One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.

0 0

1 1

X Y

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 0
0 1

]

What was transmitted over the channel if 0000 1111 was received?

0000 1111

Q−→ 0000 1111

14 / 28

The Binary Noiseless Channel

One of the simplest channels is the Binary Noiseless Channel. The
received symbol is always equal to the transmitted symbol – there is no
probability of error, hence noiseless.

0 0

1 1

X Y

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 0
0 1

]

What was transmitted over the channel if 0000 1111 was received?

0000 1111
Q−→ 0000 1111

14 / 28

The Noisy Non-overlapping Channel

Even if there is some uncertainty about the output given the input, it may
still be possible to perfectly infer what was transmitted.

0

a

X Y

b

1

c

d

Inputs X = {0, 1}; Outputs
Y = {a, b, c, d}; Transition probabilities

Q =




0.5 0
0.5 0
0 0.5
0 0.5




What was transmitted over the channel if abba ddcd was received?

0000 1111

Q−→ abba ddcd

15 / 28

The Noisy Non-overlapping Channel

Even if there is some uncertainty about the output given the input, it may
still be possible to perfectly infer what was transmitted.

0

a

X Y

b

1

c

d

Inputs X = {0, 1}; Outputs
Y = {a, b, c, d}; Transition probabilities

Q =




0.5 0
0.5 0
0 0.5
0 0.5




What was transmitted over the channel if abba ddcd was received?

0000 1111

Q−→ abba ddcd

15 / 28

The Noisy Non-overlapping Channel

Even if there is some uncertainty about the output given the input, it may
still be possible to perfectly infer what was transmitted.

0

a

X Y

b

1

c

d

Inputs X = {0, 1}; Outputs
Y = {a, b, c, d}; Transition probabilities

Q =




0.5 0
0.5 0
0 0.5
0 0.5




What was transmitted over the channel if abba ddcd was received?

0000 1111
Q−→ abba ddcd

15 / 28

The Binary Symmetric Channel

Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)

0 0

1 1

X Y

1-f

1-f

f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f

Q =
[
1− f f

f 1− f

]

What was most likely transmitted over the channel if 0010 1001 was
received, assuming f = 0.1 and P(x = 0) = P(x = 1) = 0.5?

0010 1001

Q−→ 0010 1001

16 / 28

The Binary Symmetric Channel

Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)

0 0

1 1

X Y

1-f

1-f

f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f

Q =
[
1− f f

f 1− f

]

What was most likely transmitted over the channel if 0010 1001 was
received, assuming f = 0.1 and P(x = 0) = P(x = 1) = 0.5?

0010 1001

Q−→ 0010 1001

16 / 28

The Binary Symmetric Channel

Each symbol sent across a binary symmetric channel has a chance of
being “flipped” to its counterpart (0→ 1; 1→ 0)

0 0

1 1

X Y

1-f

1-f

f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities with P(flip) = f

Q =
[
1− f f

f 1− f

]

What was most likely transmitted over the channel if 0010 1001 was
received, assuming f = 0.1 and P(x = 0) = P(x = 1) = 0.5?

0010 1001
Q−→ 0010 1001

16 / 28

Inferring the Input

Suppose we know the P(x) — the probability x is transmitted.

Given a particular output y ∈ Y received over a channel Q, how likely was
it that input x ∈ X was transmitted?

P(x |y) = P(y |x)P(x)
P(y)

=
P(y |x)P(x)∑

x ′∈X P(y |x ′)P(x ′)

17 / 28

Inferring the Input

Suppose we know the P(x) — the probability x is transmitted.

Given a particular output y ∈ Y received over a channel Q, how likely was
it that input x ∈ X was transmitted?

P(x |y) = P(y |x)P(x)
P(y)

=
P(y |x)P(x)∑

x ′∈X P(y |x ′)P(x ′)

17 / 28

Inferring the Input: Example

Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?

P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9

Similarly, P(x = 1|y = 1) = 0.9.

What if P(x = 0) = 0.01?

P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.

18 / 28

Inferring the Input: Example

Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?

P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9

Similarly, P(x = 1|y = 1) = 0.9.

What if P(x = 0) = 0.01?

P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.

18 / 28

Inferring the Input: Example

Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?

P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9

Similarly, P(x = 1|y = 1) = 0.9.

What if P(x = 0) = 0.01?

P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.

18 / 28

Inferring the Input: Example

Suppose P(x = 0) = P(x = 1) = 0.5. What are the probability that a
x = 0 was transmitted over a binary symmetric channel Q with f = 0.1
given that a y = 0 was received?

P(x = 0|y = 0) = 0.9× 0.5
0.9× 0.5 + 0.1× 0.5 = 0.9

Similarly, P(x = 1|y = 1) = 0.9.

What if P(x = 0) = 0.01?

P(x = 0|y = 0) = 0.9× 0.01
0.9× 0.01 + 0.1× 0.99 ≈ 0.0833.

18 / 28

The Z Channel

Symbols may be corrupted over the channel asymmetrically.

0 0

1 1

X Y

1

1-f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 f
0 1− f

]

Inferring the input: Clearly P(x = 1|y = 1) = 1 but

P(x = 0|y = 0) = P(y = 0|x = 0)P(x = 0)∑
x ′∈X P(y = 0|x ′)P(x ′)

=
P(x = 0)

P(x = 0) + f P(x = 1)

So P(x = 0|y = 0)→ 1 as f → 0, and goes to P(x = 0) as f → 1

19 / 28

The Z Channel

Symbols may be corrupted over the channel asymmetrically.

0 0

1 1

X Y

1

1-f

f

Inputs X = {0, 1}; Outputs Y = {0, 1};
Transition probabilities

Q =
[
1 f
0 1− f

]

Inferring the input: Clearly P(x = 1|y = 1) = 1 but

P(x = 0|y = 0) = P(y = 0|x = 0)P(x = 0)∑
x ′∈X P(y = 0|x ′)P(x ′)

=
P(x = 0)

P(x = 0) + f P(x = 1)

So P(x = 0|y = 0)→ 1 as f → 0, and goes to P(x = 0) as f → 1

19 / 28

The Binary Erasure Channel

We can model a channel which “erases” bits by letting one of the output
symbols be the symbol ’?’ with associated probability f . The receiver
knows which bits are erased.

0 0

1 1

X Y

1-f

1-f

f

f

?

Inputs X = {0, 1}; Outputs Y = {0, ?, 1};
Transition probabilities

Q =


1− f 0
f f
0 1− f


Example:
0000 1111

Q−→ 00?0 ?11?

20 / 28

The Noisy Typewriter Channel

This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.

A A

C

B B

D

C

D

E

Z

Y Y

Z

..
.

..
.

..
.

Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities

Q =




1
3

1
3 0 0 · · · 0

1
3

1
3

1
3

1
3 0 · · · 0 0

0 13
1
3

1
3 · · · 0 0



. . .


1
3 0 0 · · ·

1
3

1
3




The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.

21 / 28

The Noisy Typewriter Channel

This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.

A A

C

B B

D

C

D

E

Z

Y Y

Z

..
.

..
.

..
.

Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities

Q =




1
3

1
3 0 0 · · · 0

1
3

1
3

1
3

1
3 0 · · · 0 0

0 13
1
3

1
3 · · · 0 0



. . .


1
3 0 0 · · ·

1
3

1
3




The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.

21 / 28

The Noisy Typewriter Channel

This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.

A A

C

B B

D

C

D

E

Z

Y Y

Z

..
.

..
.

..
.

Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities

Q =




1
3

1
3 0 0 · · · 0

1
3

1
3

1
3

1
3 0 · · · 0 0

0 13
1
3

1
3 · · · 0 0



. . .


1
3 0 0 · · ·

1
3

1
3




The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.

21 / 28

The Noisy Typewriter Channel

This channel simulates a noisy “typewriter”. Inputs and outputs are 26
letters A through Z plus space. With probability 13 , each letter is either:
unchanged; changed to the next letter, changed to the previous letter.

A A

C

B B

D

C

D

E

Z

Y Y

Z

..
.

..
.

..
.

Inputs X = {A, B, . . . , Z, };
Outputs Y = {A, B, . . . , Z, };
Transition probabilities

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

148 9 — Communication over a Noisy Channel

Some useful model channels are:

Binary symmetric channel. AX = {0, 1}. AY ={0, 1}.

x
!

!

“”#$$%
1

0

1

0
y P (y =0 |x=0) = 1 ! f ;

P (y =1 |x=0) = f ;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1

0

0 1

Binary erasure channel. AX = {0, 1}. AY = {0, ?, 1}.

x
!

!

“”#
$$%

1

0

1

0

? y
P (y =0 |x=0) = 1 ! f ;
P (y =? |x=0) = f ;
P (y =1 |x=0) = 0;

P (y =0 |x=1) = 0;
P (y =? |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1

?
0

0 1

Noisy typewriter. AX = AY = the 27 letters {A, B, . . . , Z, -}. The letters
are arranged in a circle, and when the typist attempts to type B, what
comes out is either A, B or C, with probability 1/3 each; when the input is
C, the output is B, C or D; and so forth, with the final letter ‘-’ adjacent
to the first letter A.

!

!
!

&&
&’

((()

Z
Y


Z
Y

&&
&’

((()

!&&
&’

…((()

&&
&’!&

&&’
((()

!””
“#

$$$%

!&&
&’

((()

!&&
&’

((()

!&&
&’

((()

!&&
&’

((()

!&&
&’

((()

!((()

*
*
*
*
*
*
*
*
*
*
*
*+,

,
,
,
,
,
,
,
,
,
,
,-

H H
G G
F F
E E
D D
C C
B B
A A


P (y =F |x=G) = 1/3;
P (y =G |x=G) = 1/3;
P (y =H |x=G) = 1/3;


Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z –

Z channel. AX ={0, 1}. AY ={0, 1}.

x
!

!

“”#
1

0

1

0
y P (y =0 |x=0) = 1;

P (y =1 |x=0) = 0;
P (y =0 |x=1) = f ;
P (y =1 |x=1) = 1 ! f. 1

0

0 1

9.4 Inferring the input given the output

If we assume that the input x to a channel comes from an ensemble X, then
we obtain a joint ensemble XY in which the random variables x and y have
the joint distribution:

P (x, y) = P (y |x)P (x). (9.3)
Now if we receive a particular symbol y, what was the input symbol x? We
typically won’t know for certain. We can write down the posterior distribution
of the input using Bayes’ theorem:

P (x | y) = P (y |x)P (x)
P (y)

=
P (y |x)P (x)!
x! P (y |x!)P (x!)

. (9.4)

Example 9.1. Consider a binary symmetric channel with probability of error
f =0.15. Let the input ensemble be PX : {p0 =0.9, p1 =0.1}. Assume
we observe y =1.

P (x=1 | y =1) = P (y =1 |x=1)P (x=1)!
x! P (y |x!)P (x!)

=
0.85 ” 0.1

0.85 ” 0.1 + 0.15 ” 0.9
=

0.085

0.22
= 0.39. (9.5)

The transition matrix for this channel has a diagonal structure: all of the
probability mass is concentrated around the diagonal.

21 / 28

1 Communication over Noisy Channels: Big Picture

2 Noisy Channels: Formally

3 Examples of Channels

4 Probability of Error

22 / 28

Communicating over Noisy Channels

Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.

QEncoder
x ysin sout

Decoder

Reliability is measured via probability of error — that is, the probability of
incorrectly decoding sout given sin as input:

P(sout 6= sin) =

s

P(sout 6= sin|sin = s)P(sin = s)

23 / 28

Communicating over Noisy Channels

Suppose we know we have to communicate over some channel Q and we
want build an encoder/decoder pair to reliably send a message s over Q.

QEncoder
x ysin sout

Decoder

Reliability is measured via probability of error — that is, the probability of
incorrectly decoding sout given sin as input:

P(sout 6= sin) =

s

P(sout 6= sin|sin = s)P(sin = s)

23 / 28

Probability of Error: Example

Let S = {a, b} and X = Y = {0, 1}

Assume encoder: a→ 0 ; b→ 1, decoder: 0→ a ; 1→ b.

Consider binary symmetric Q,

Q =
[
1− f f

f 1− f

]

with f = 0.1

24 / 28

Probability of Error: Example

If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),

P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)

= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f

= 0.1

25 / 28

Probability of Error: Example

If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),

P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)

= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f

= 0.1

25 / 28

Probability of Error: Example

If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),

P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb

= f

= 0.1

25 / 28

Probability of Error: Example

If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),

P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f

= 0.1

25 / 28

Probability of Error: Example

If base probabilities of symbol transmission are (pa, pb) = (0.5, 0.5),

P(sin 6= sout) = P(sin = a, sout = b) + P(sin = b, sout = a)
= P(sout = b|sin = a)P(sin = a)+
=P(sout = a|sin = b)P(sin = b)
= P(y = 1|x = 0) pa + P(y = 0|x = 1) pb
= f

= 0.1

25 / 28

A Simple Coding Scheme

Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s

000, 001, 010, 100︸ ︷︷ ︸
A

→ a and 111, 110, 101, 011︸ ︷︷ ︸
B

→ b

If the channel Q is binary symmetric with f = 0.1 again

P(sin 6= sout) = P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ) = 0.028

So the error has dropped from 0.1 to 0.028
but so has the rate: from 1 symbol/bit to 1/3 symbol/bit.

26 / 28

A Simple Coding Scheme

Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s

000, 001, 010, 100︸ ︷︷ ︸
A

→ a and 111, 110, 101, 011︸ ︷︷ ︸
B

→ b

If the channel Q is binary symmetric with f = 0.1 again

P(sin 6= sout) = P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ) = 0.028

So the error has dropped from 0.1 to 0.028
but so has the rate: from 1 symbol/bit to 1/3 symbol/bit.

26 / 28

A Simple Coding Scheme

Suppose s ∈ {a, b} and we encode by a→ 000 and b→ 111.
To decode we count the number of 1s and 0s and set all bits to the
majority count to determine s

000, 001, 010, 100︸ ︷︷ ︸
A

→ a and 111, 110, 101, 011︸ ︷︷ ︸
B

→ b

If the channel Q is binary symmetric with f = 0.1 again

P(sin 6= sout) = P(y ∈ B|000) pa + P(y ∈ A|111) pb
= [f 3 + 3f 2(1− f )]pa + [f 3 + 3f 2(1− f )]pb
= f 3 + 3f 2(1− f ) = 0.028

So the error has dropped from 0.1 to 0.028
but so has the rate: from 1 symbol/bit to 1/3 symbol/bit.

26 / 28

A Simple Coding Scheme

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1

Rate

more useful codesR5

R3

R61

R1

p

b

0.1

0.01

1e-05

1e-10

1e-15

0 0.2 0.4 0.6 0.8 1

Rate

more useful codes

R5
R3

R61

R1

Can we make the error arbitrarily small without the rate going to zero?

27 / 28

A Simple Coding Scheme

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1

Rate

more useful codesR5

R3

R61

R1

p

b

0.1

0.01

1e-05

1e-10

1e-15

0 0.2 0.4 0.6 0.8 1

Rate

more useful codes

R5
R3

R61

R1

Can we make the error arbitrarily small without the rate going to zero?

27 / 28

Summary and Reading

Main Points:
Modelling Noisy Channels

I Noiseless, Overlap, Symmetric, Z, Erasure
Simple Coding via Repetition

I Probability of Error vs Transmission Rate

Reading:

MacKay §9.1 – §9.5

Cover & Thomas §7.1 – §7.3

28 / 28

Communication over Noisy Channels: Big Picture
Noisy Channels: Formally
Examples of Channels
Probability of Error