COMP2610/6261 – Information Theory – Lecture 15: Shannon-Fano-Elias and Interval Coding
COMP2610/6261 – Information Theory
Lecture 15: Shannon-Fano-Elias and Interval Coding
Robert C. Williamson
Research School of Computer Science
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
24 September, 2018
1 / 35
1 The Trouble with Huffman Coding
2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length
2 / 35
Prefix Codes as Trees (Recap)
C2 = {0, 10, 110, 111}
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
3 / 35
The Source Coding Theorem for Symbol Codes
Source Coding Theorem for Symbol Codes
For any ensemble X there exists a prefix code C such that
H(X ) ≤ L(C,X ) < H(X ) + 1.
In particular, Shannon codes C — those with lengths `i =
⌈
log2
1
pi
⌉
—
have expected code length within 1 bit of the entropy.
4 / 35
Huffman Coding: Recap
AX = {a, b, c, d, e} and PX = {0.25, 0.25, 0.2, 0.15, 0.15}
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.
5.5: Optimal source coding with symbol codes: Hu!man coding 99
The Hu!man coding algorithm
We now present a beautifully simple algorithm for finding an optimal prefix
code. The trick is to construct the code backwards starting from the tails of
the codewords; we build the binary tree from its leaves.
Algorithm 5.4. Hu!man coding
algorithm.1. Take the two least probable symbols in the alphabet. These two
symbols will be given the longest codewords, which will have equal
length, and di!er only in the last digit.
2. Combine these two symbols into a single symbol, and repeat.
Since each step reduces the size of the alphabet by one, this algorithm will
have assigned strings to all the symbols after |AX | ! 1 steps.
Example 5.15. Let AX = { a, b, c, d, e }
and PX = { 0.25, 0.25, 0.2, 0.15, 0.15 }.
0.25
0.25
0.2
0.15
0.15
0.25
0.25
0.2
0.3
0.25
0.45
0.3
0.55
0.45
1.0a
b
c
d
e
0
1
0
1
0
1
0
1
!
!
!
!
"
"
"
"
"
!
!
x step 1 step 2 step 3 step 4
The codewords are then obtained by concatenating the binary digits in
reverse order: C = {00, 10, 11, 010, 011}. The codelengths selected
ai pi h(pi) li c(ai)
a 0.25 2.0 2 00
b 0.25 2.0 2 10
c 0.2 2.3 2 11
d 0.15 2.7 3 010
e 0.15 2.7 3 011
Table 5.5. Code created by the
Hu!man algorithm.
by the Hu!man algorithm (column 4 of table 5.5) are in some cases
longer and in some cases shorter than the ideal codelengths, the Shannon
information contents log2
1/pi (column 3). The expected length of the
code is L = 2.30 bits, whereas the entropy is H = 2.2855 bits. !
If at any point there is more than one way of selecting the two least probable
symbols then the choice may be made in any manner – the expected length of
the code will not depend on the choice.
Exercise 5.16.
[3, p.105]
Prove that there is no better symbol code for a source
than the Hu!man code.
Example 5.17. We can make a Hu!man code for the probability distribution
over the alphabet introduced in figure 2.1. The result is shown in fig-
ure 5.6. This code has an expected length of 4.15 bits; the entropy of
the ensemble is 4.11 bits. Observe the disparities between the assigned
codelengths and the ideal codelengths log2
1/pi.
Constructing a binary tree top-down is suboptimal
In previous chapters we studied weighing problems in which we built ternary
or binary trees. We noticed that balanced trees – ones in which, at every step,
the two possible outcomes were as close as possible to equiprobable – appeared
to describe the most e"cient experiments. This gave an intuitive motivation
for entropy as a measure of information content.
From Example 5.15 of MacKay
C = {00, 10, 11, 010, 011}
5 / 35
Huffman Coding: Advantages and Disadvantages
Advantages:
Huffman Codes are provably optimal amongst prefix codes
Algorithm is simple and efficient
Disadvantages:
Assumes a fixed distribution of symbols
The extra bit in the SCT
I If H(X ) is large – not a problem
I If H(X ) is small (e.g., ∼ 1 bit for English) codes are 2× optimal
Huffman codes are the best possible symbol code
but symbol coding is not always the best type of code
6 / 35
Huffman Coding: Advantages and Disadvantages
Advantages:
Huffman Codes are provably optimal amongst prefix codes
Algorithm is simple and efficient
Disadvantages:
Assumes a fixed distribution of symbols
The extra bit in the SCT
I If H(X ) is large – not a problem
I If H(X ) is small (e.g., ∼ 1 bit for English) codes are 2× optimal
Huffman codes are the best possible symbol code
but symbol coding is not always the best type of code
6 / 35
This time
A different way of coding (interval coding)
Shannon-Fano-Elias codes
Worse guarantee than Huffman codes, but will lead us to the powerful
arithmetic coding procedure
7 / 35
1 The Trouble with Huffman Coding
2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length
8 / 35
Coding via Cumulative Probabilities
Suppose X is an ensemble with probabilities (p1, . . . , p|X |)
Define the cumulative distribution function by
F (x) =
∑
i≤x
pi
and the modified cumulative distribution function by
F (x) =
∑
i
bF (x)c`(x) +
1
2`
≤ F (x) + 1
2`
≤ F (x) + p(x)
2
= F (x),
and so [
bF (x)c`(x), bF (x)c`(x) +
1
2`
)
⊂ [F (x − 1),F (x))
The intervals for each codeword are thus trivially disjoint, since we know
each of the [F (x − 1),F (x)) intervals is disjoint
The SFE code is prefix-free!
27 / 35
Two Types of Interval
The symbol interval for some outcome xi is (assuming F (x0) = 0)
[F (xi−1),F (xi))
These intervals are disjoint for each outcome
The codeword interval for some outcome xi is[
bF (xi)c`(xi), bF (xi)c`(xi) +
1
2`(xi)
)
This is a strict subset of the symbol interval
All strings in the codeword interval start with the same prefix
This is not true in general for the symbol interval
28 / 35
1 The Trouble with Huffman Coding
2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length
29 / 35
Shannon-Fano-Elias Decoding
To decode a given bitstring:
1 start with the first bit, and compute the corresponding binary interval
2 if the interval is strictly contained within that of a codeword:
1 output the codeword
2 skip over any redundant bits for this codeword
3 repeat (1) for the rest of the bitstring
3 else include next bit, and compute the corresponding binary interval
4
…
We might be able to stop early owing to redundancies in SFE
30 / 35
Shannon-Fano-Elias Decoding
Let p = { 29 ,
1
9 ,
1
3 ,
1
3}. Suppose we want to decode 01000:
Find symbol interval containing codeword interval for 01000 = [0.25, 0.28125)10
a1
a2
a3
a4
0
1
[0,0.5)
0
[0.25,0.5)
01
[0.25,0.375)
010
[0.25,0.3125)
0100
[0.25,0.2815)
010000.22
0.33
We could actually stop once we see 0100, since [0.25, 0.3125) ⊂ [0.22, 0.33]
31 / 35
1 The Trouble with Huffman Coding
2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length
32 / 35
Expected Code Length of SFE Code
The extra bit for the code lengths is because we code pi2 and
log2
2
pi
= log2
1
pi
+ log2 2 = log2
1
pi
+ 1
What is the expected length of a SFE code C for ensemble X with
probabilities p?
L(C,X ) =
K∑
i=1
pi`(ai) =
K∑
i=1
pi
(⌈
log2
1
pi
⌉
+ 1
)
≤
K∑
i=1
pi
(
log2
1
pi
+ 2
)
= H(X ) + 2
Similarly, H(X ) + 1 ≤ L(C,X ) for the SFE codes.
33 / 35
Why bother?
Let X be an ensemble, CSFE be a Shannon-Fano-Elias code for X and CH
be a Huffman code for X
H(X ) ≤ L(CH ,X ) ≤ H(X ) + 1︸ ︷︷ ︸
Source Coding Theorem
≤ L(CSFE ,X ) ≤ H(X ) + 2
so why not just use Huffman codes?
SFE is a stepping stone to a more powerful type of codes
Roughly, try to apply SFE to a block of outcomes
34 / 35
Summary and Reading
Main points:
Problems with Huffman coding symbol distribution
Binary strings to/from intervals in [0, 1]
Shannon-Fano-Elias Coding:
I Code C via cumulative distribution function for p
I H(X ) + 1 ≤ L(C,X ) ≤ H(X ) + 2
Extra bit guarantees interval containment
Reading:
Interval coding: MacKay §6.1 and §6.2
Shannon-Fano-Elias Coding: Cover & Thomas §5.9
Next time:
Extending SFE Coding to sequences of symbols
35 / 35
Summary and Reading
Main points:
Problems with Huffman coding symbol distribution
Binary strings to/from intervals in [0, 1]
Shannon-Fano-Elias Coding:
I Code C via cumulative distribution function for p
I H(X ) + 1 ≤ L(C,X ) ≤ H(X ) + 2
Extra bit guarantees interval containment
Reading:
Interval coding: MacKay §6.1 and §6.2
Shannon-Fano-Elias Coding: Cover & Thomas §5.9
Next time:
Extending SFE Coding to sequences of symbols
35 / 35
Summary and Reading
Main points:
Problems with Huffman coding symbol distribution
Binary strings to/from intervals in [0, 1]
Shannon-Fano-Elias Coding:
I Code C via cumulative distribution function for p
I H(X ) + 1 ≤ L(C,X ) ≤ H(X ) + 2
Extra bit guarantees interval containment
Reading:
Interval coding: MacKay §6.1 and §6.2
Shannon-Fano-Elias Coding: Cover & Thomas §5.9
Next time:
Extending SFE Coding to sequences of symbols
35 / 35
The Trouble with Huffman Coding
Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length