程序代写代做代考 information theory algorithm AI COMP2610/6261 – Information Theory – Lecture 15: Shannon-Fano-Elias and Interval Coding

COMP2610/6261 – Information Theory – Lecture 15: Shannon-Fano-Elias and Interval Coding

COMP2610/6261 – Information Theory
Lecture 15: Shannon-Fano-Elias and Interval Coding

Robert C. Williamson

Research School of Computer Science

1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y

ANU Logo Use Guidelines

Deep Gold
C30 M50 Y70 K40

PMS Metallic 8620

PMS 463

Black
C0 M0 Y0 K100

PMS Process Black

Preferred logo Black version

Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact

brand@anu.edu.au

The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:

First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.

Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.

Black
Where colour printing is not available, the black logo can
be used on a white background.

Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.

24 September, 2018

1 / 35

1 The Trouble with Huffman Coding

2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length

2 / 35

Prefix Codes as Trees (Recap)

C2 = {0, 10, 110, 111}

1111

1110
1101

1100
1011
1010

1001
1000
0111

0110
0101
0100

0011

0010
0001
0000

111

110

101

100

011

010

001

000

11

10

01

00

0

1

T
he

to
ta

l s
ym

bo
l c

od
e

bu
dg

et

3 / 35

The Source Coding Theorem for Symbol Codes

Source Coding Theorem for Symbol Codes
For any ensemble X there exists a prefix code C such that

H(X ) ≤ L(C,X ) < H(X ) + 1. In particular, Shannon codes C — those with lengths `i = ⌈ log2 1 pi ⌉ — have expected code length within 1 bit of the entropy. 4 / 35 Huffman Coding: Recap AX = {a, b, c, d, e} and PX = {0.25, 0.25, 0.2, 0.15, 0.15} Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. 5.5: Optimal source coding with symbol codes: Hu!man coding 99 The Hu!man coding algorithm We now present a beautifully simple algorithm for finding an optimal prefix code. The trick is to construct the code backwards starting from the tails of the codewords; we build the binary tree from its leaves. Algorithm 5.4. Hu!man coding algorithm.1. Take the two least probable symbols in the alphabet. These two symbols will be given the longest codewords, which will have equal length, and di!er only in the last digit. 2. Combine these two symbols into a single symbol, and repeat. Since each step reduces the size of the alphabet by one, this algorithm will have assigned strings to all the symbols after |AX | ! 1 steps. Example 5.15. Let AX = { a, b, c, d, e } and PX = { 0.25, 0.25, 0.2, 0.15, 0.15 }. 0.25 0.25 0.2 0.15 0.15 0.25 0.25 0.2 0.3 0.25 0.45 0.3 0.55 0.45 1.0a b c d e 0 1 0 1 0 1 0 1 ! ! ! ! " " " " " ! ! x step 1 step 2 step 3 step 4 The codewords are then obtained by concatenating the binary digits in reverse order: C = {00, 10, 11, 010, 011}. The codelengths selected ai pi h(pi) li c(ai) a 0.25 2.0 2 00 b 0.25 2.0 2 10 c 0.2 2.3 2 11 d 0.15 2.7 3 010 e 0.15 2.7 3 011 Table 5.5. Code created by the Hu!man algorithm. by the Hu!man algorithm (column 4 of table 5.5) are in some cases longer and in some cases shorter than the ideal codelengths, the Shannon information contents log2 1/pi (column 3). The expected length of the code is L = 2.30 bits, whereas the entropy is H = 2.2855 bits. ! If at any point there is more than one way of selecting the two least probable symbols then the choice may be made in any manner – the expected length of the code will not depend on the choice. Exercise 5.16. [3, p.105] Prove that there is no better symbol code for a source than the Hu!man code. Example 5.17. We can make a Hu!man code for the probability distribution over the alphabet introduced in figure 2.1. The result is shown in fig- ure 5.6. This code has an expected length of 4.15 bits; the entropy of the ensemble is 4.11 bits. Observe the disparities between the assigned codelengths and the ideal codelengths log2 1/pi. Constructing a binary tree top-down is suboptimal In previous chapters we studied weighing problems in which we built ternary or binary trees. We noticed that balanced trees – ones in which, at every step, the two possible outcomes were as close as possible to equiprobable – appeared to describe the most e"cient experiments. This gave an intuitive motivation for entropy as a measure of information content. From Example 5.15 of MacKay C = {00, 10, 11, 010, 011} 5 / 35 Huffman Coding: Advantages and Disadvantages Advantages: Huffman Codes are provably optimal amongst prefix codes Algorithm is simple and efficient Disadvantages: Assumes a fixed distribution of symbols The extra bit in the SCT I If H(X ) is large – not a problem I If H(X ) is small (e.g., ∼ 1 bit for English) codes are 2× optimal Huffman codes are the best possible symbol code but symbol coding is not always the best type of code 6 / 35 Huffman Coding: Advantages and Disadvantages Advantages: Huffman Codes are provably optimal amongst prefix codes Algorithm is simple and efficient Disadvantages: Assumes a fixed distribution of symbols The extra bit in the SCT I If H(X ) is large – not a problem I If H(X ) is small (e.g., ∼ 1 bit for English) codes are 2× optimal Huffman codes are the best possible symbol code but symbol coding is not always the best type of code 6 / 35 This time A different way of coding (interval coding) Shannon-Fano-Elias codes Worse guarantee than Huffman codes, but will lead us to the powerful arithmetic coding procedure 7 / 35 1 The Trouble with Huffman Coding 2 Interval Coding Shannon-Fano-Elias Coding Lossless property The Prefix Property and Intervals Decoding Expected Length 8 / 35 Coding via Cumulative Probabilities Suppose X is an ensemble with probabilities (p1, . . . , p|X |) Define the cumulative distribution function by F (x) = ∑ i≤x pi and the modified cumulative distribution function by F (x) = ∑ i F (x − 1). We also have

bF (x)c`(x) +
1
2`
≤ F (x) + 1

2`

≤ F (x) + p(x)
2

= F (x),

and so [
bF (x)c`(x), bF (x)c`(x) +

1
2`

)
⊂ [F (x − 1),F (x))

The intervals for each codeword are thus trivially disjoint, since we know
each of the [F (x − 1),F (x)) intervals is disjoint

The SFE code is prefix-free!

27 / 35

Two Types of Interval

The symbol interval for some outcome xi is (assuming F (x0) = 0)

[F (xi−1),F (xi))

These intervals are disjoint for each outcome

The codeword interval for some outcome xi is[
bF (xi)c`(xi), bF (xi)c`(xi) +

1
2`(xi)

)
This is a strict subset of the symbol interval

All strings in the codeword interval start with the same prefix

This is not true in general for the symbol interval

28 / 35

1 The Trouble with Huffman Coding

2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length

29 / 35

Shannon-Fano-Elias Decoding

To decode a given bitstring:
1 start with the first bit, and compute the corresponding binary interval

2 if the interval is strictly contained within that of a codeword:
1 output the codeword

2 skip over any redundant bits for this codeword

3 repeat (1) for the rest of the bitstring
3 else include next bit, and compute the corresponding binary interval

4

We might be able to stop early owing to redundancies in SFE

30 / 35

Shannon-Fano-Elias Decoding

Let p = { 29 ,
1
9 ,

1
3 ,

1
3}. Suppose we want to decode 01000:

Find symbol interval containing codeword interval for 01000 = [0.25, 0.28125)10

a1

a2

a3

a4

0

1

[0,0.5)

0

[0.25,0.5)

01

[0.25,0.375)

010

[0.25,0.3125)

0100

[0.25,0.2815)

010000.22

0.33

We could actually stop once we see 0100, since [0.25, 0.3125) ⊂ [0.22, 0.33]
31 / 35

1 The Trouble with Huffman Coding

2 Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length

32 / 35

Expected Code Length of SFE Code

The extra bit for the code lengths is because we code pi2 and

log2
2
pi

= log2
1
pi

+ log2 2 = log2
1
pi

+ 1

What is the expected length of a SFE code C for ensemble X with
probabilities p?

L(C,X ) =
K∑

i=1

pi`(ai) =
K∑

i=1

pi

(⌈
log2

1
pi


+ 1
)


K∑

i=1

pi

(
log2

1
pi

+ 2
)

= H(X ) + 2

Similarly, H(X ) + 1 ≤ L(C,X ) for the SFE codes.

33 / 35

Why bother?

Let X be an ensemble, CSFE be a Shannon-Fano-Elias code for X and CH
be a Huffman code for X

H(X ) ≤ L(CH ,X ) ≤ H(X ) + 1︸ ︷︷ ︸
Source Coding Theorem

≤ L(CSFE ,X ) ≤ H(X ) + 2

so why not just use Huffman codes?

SFE is a stepping stone to a more powerful type of codes

Roughly, try to apply SFE to a block of outcomes

34 / 35

Summary and Reading

Main points:

Problems with Huffman coding symbol distribution

Binary strings to/from intervals in [0, 1]

Shannon-Fano-Elias Coding:
I Code C via cumulative distribution function for p

I H(X ) + 1 ≤ L(C,X ) ≤ H(X ) + 2
Extra bit guarantees interval containment

Reading:

Interval coding: MacKay §6.1 and §6.2
Shannon-Fano-Elias Coding: Cover & Thomas §5.9

Next time:

Extending SFE Coding to sequences of symbols

35 / 35

Summary and Reading

Main points:

Problems with Huffman coding symbol distribution

Binary strings to/from intervals in [0, 1]

Shannon-Fano-Elias Coding:
I Code C via cumulative distribution function for p

I H(X ) + 1 ≤ L(C,X ) ≤ H(X ) + 2
Extra bit guarantees interval containment

Reading:

Interval coding: MacKay §6.1 and §6.2
Shannon-Fano-Elias Coding: Cover & Thomas §5.9

Next time:

Extending SFE Coding to sequences of symbols

35 / 35

Summary and Reading

Main points:

Problems with Huffman coding symbol distribution

Binary strings to/from intervals in [0, 1]

Shannon-Fano-Elias Coding:
I Code C via cumulative distribution function for p

I H(X ) + 1 ≤ L(C,X ) ≤ H(X ) + 2
Extra bit guarantees interval containment

Reading:

Interval coding: MacKay §6.1 and §6.2
Shannon-Fano-Elias Coding: Cover & Thomas §5.9

Next time:

Extending SFE Coding to sequences of symbols

35 / 35

The Trouble with Huffman Coding
Interval Coding
Shannon-Fano-Elias Coding
Lossless property
The Prefix Property and Intervals
Decoding
Expected Length