COMP2610/6261 – Information Theory Lecture 15: Shannon-Fano-Elias and Interval Coding
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:
Copyright By PowCoder代写 加微信 powcoder
earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
24 September, 2018
1 The Trouble with
2 Interval Coding Shannon-Fano- Lossless property
The Prefix Property and Intervals Decoding
Expected Length
Prefix Codes as Trees (Recap)
C2 ={0,10,110,111}
The total symbol code budget
The Source Coding Theorem for Symbol Codes
Source Coding Theorem for Symbol Codes
For any ensemble X there exists a prefix code C such that H (X ) ≤ L(C , X ) < H (X ) + 1.
In particular, Shannon codes C — those with lengths li = log2 p1 — i
have expected code length within 1 bit of the entropy.
d strings to all the symbols after |AX | − stepsà 5
.Let AX={a, b, c,d, e }
and AP={=a,b{,cà,àd2,e5},aànàd2P5,à=à2{,0.à2à5,05.,25à,à0.25,}0.à15,0.15}
x step step 2 step 3 step 4
àà25 0 àà55 0 àà 0 "" !!
a àà25 àà25
bàà25 àà25!!àà45"àà451
càà2 àà21
d àà5 0 àà3 àà3 !!
dewords are then obtained by concatenating the binar
order: C = {00,10,11,010,011}à The codelength
From Example 5.15 of Mac = {00, 10, 11, 010, 011}
Hu!man algorithm column 4 of table 5à5 are in s
: Advantages and Disadvantages
Advantages:
are provably optimal amongst prefix codes
Algorithm is simple and efficient
: Advantages and Disadvantages
Advantages:
are provably optimal amongst prefix codes
Algorithm is simple and efficient Disadvantages:
Assumes a fixed distribution of symbols The extra bit in the SCT
If H(X) is large – not a problem
If H(X) is small (e.g., ∼ 1 bit for English) codes are 2× optimal
Huffman codes are the best possible symbol code but symbol coding is not always the best type of code
A different way of coding (interval coding) Shannon-Fano-Elias codes
Worse guarantee than Huffman codes, but will lead us to the powerful arithmetic coding procedure
1 The Trouble with
2 Interval Coding Shannon-Fano- Lossless property
The Prefix Property and Intervals Decoding
Expected Length
Coding via Cumulative Probabilities
Suppose X is an ensemble with probabilities (p1 , . . . , p|X | ) Define the cumulative distribution function by
F(x) = pi i≤x
Coding via Cumulative Probabilities
Suppose X is an ensemble with probabilities (p1 , . . . , p|X | ) Define the cumulative distribution function by
F(x) = pi i≤x
and the modified cumulative distribution function by
F(x)=pi +12·p(x)=F(x)−21·p(x) i
≤ F(x) + p(x)
= F(x), 1
⌊F (x )⌋l(x ) , ⌊F (x )⌋l(x ) + 2l ⊂ [F (x − 1), F (x ))
The intervals for each codeword are thus trivially disjoint, since we know
each of the [F (x − 1), F (x )) intervals is disjoint The SFE code is prefix-free!
Two Types of Interval
The symbol interval for some outcome xi is (assuming F(x0) = 0) [F(xi−1),F(xi))
These intervals are disjoint for each outcome The codeword interval for some outcome xi is
1 ⌊F (xi )⌋l(xi ), ⌊F (xi )⌋l(xi ) + 2l(xi )
This is a strict subset of the symbol interval
All strings in the codeword interval start with the same prefix This is not true in general for the symbol interval
1 The Trouble with
2 Interval Coding Shannon-Fano- Lossless property
The Prefix Property and Intervals Decoding
Expected Length
Shannon-Fano-
To decode a given bitstring:
1 start with the first bit, and compute the corresponding binary interval
2 if the interval is strictly contained within that of a codeword:
1 output the codeword
2 skip over any redundant bits for this codeword
3 repeat (1) for the rest of the bitstring
3 else include next bit, and compute the corresponding binary interval
We might be able to stop early owing to redundancies in SFE
Shannon-Fano-
Let p = {29, 19, 31, 13}. Suppose we want to decode 01000:
Find symbol interval containing codeword interval for 01000 = [0.25, 0.28125)10
0 010 0100
[0,0.5) [0.25,0.5) [0.25,0.375) [0.25,0.3125) [0.25,0.2815)
We could actually stop once we see 0100, since [0.25, 0.3125) ⊂ [0.22, 0.33]
1 The Trouble with
2 Interval Coding Shannon-Fano- Lossless property
The Prefix Property and Intervals Decoding
Expected Length
Expected Code Length of SFE Code
The extra bit for the code lengths is because we code pi and
2 log 2=log 1+log2=log 1+1
2 pi 2 pi 2 2 pi
What is the expected length of a SFE code C for ensemble X with
probabilities p?
K K 1 L(C,X)=pil(ai)=pi log2 p +1
≤ pi log2p+2 i=1 i
= H(X) + 2 Similarly, H(X) + 1 ≤ L(C,X) for the SFE codes.
Why bother?
Let X be an ensemble, CSFE be a Shannon-Fano-Elias code for X and CH be a Huffman code for X
H (X ) ≤ L(CH , X ) ≤ H (X ) + 1 ≤ L(CSFE , X ) ≤ H (X ) + 2
Source Coding Theorem
so why not just use Huffman codes?
SFE is a stepping stone to a more powerful type of codes Roughly, try to apply SFE to a block of outcomes
Summary and Reading
Main points:
Problems with Huffman coding symbol distribution
Binary strings to/from intervals in [0, 1] Shannon-Fano- :
Code C via cumulative distribution function for p
H(X)+1≤L(C,X)≤H(X)+2
Extra bit guarantees interval containment
Summary and Reading
Main points:
Problems with Huffman coding symbol distribution
Binary strings to/from intervals in [0, 1] Shannon-Fano- :
Code C via cumulative distribution function for p
H(X)+1≤L(C,X)≤H(X)+2
Extra bit guarantees interval containment
Interval coding: MacKay §6.1 and §6.2
Shannon-Fano- : Cover & Thomas §5.9
Summary and Reading
Main points:
Problems with Huffman coding symbol distribution
Binary strings to/from intervals in [0, 1] Shannon-Fano- :
Code C via cumulative distribution function for p
H(X)+1≤L(C,X)≤H(X)+2
Extra bit guarantees interval containment
Interval coding: MacKay §6.1 and §6.2
Shannon-Fano- : Cover & Thomas §5.9 Next time:
Extending SFE Coding to sequences of symbols
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com