COMP2610 / 6261 – Information Theory Lecture 14: Source Coding Theorem for Symbol Codes
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:
Copyright By PowCoder代写 加微信 powcoder
earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
Research School of Computer Science The Australian National University
e used white reversed out of a black
Preferred logo
Black version
17 September, 2018
Variable-length codes
Uniquely decodable and prefix codes Prefix codes as trees
Kraft’s inequality:
Lengths {li}Ii=1 can form a prefix code ⇐⇒ Ii=1 2−i ≤ 1
How to generate prefix codes?
Prefix Codes (Recap)
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c? Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1,…,cI} is a prefix code if for every codeword ci ∈ C there is no prefix of ci in C.
In a stream, no confusing one codeword with another
Prefix Codes as Trees (Recap)
C2 ={0,10,110,111}
The total symbol code budget
Bound on expected length for a prefix code Shannon codes
Huffman coding
1 Expected Code Length
Minimising Expected Code Length Shannon Coding
2 The Source Coding Theorem for Symbol Codes
Algorithm and Examples Advantages and Disadvantages
1 Expected Code Length
Minimising Expected Code Length Shannon Coding
2 The Source Coding Theorem for Symbol Codes
Algorithm and Examples Advantages and Disadvantages
Expected Code Length
With uniform codes, the length of a message of N outcomes is trivial to compute
With variable-length codes, the length of a message of N outcomes will depend on the outcomes we observe
Outcomes we observe have some uncertainty
On average, what length of message can we expect?
Expected Code Length
With uniform codes, the length of a message of N outcomes is trivial to compute
With variable-length codes, the length of a message of N outcomes will depend on the outcomes we observe
Outcomes we observe have some uncertainty
On average, what length of message can we expect?
Expected Code Length
The expected length for a code C for ensemble X with AX = {a1,…,aI} and PX = {p1,…,pI} is
I L(C,X) = E[l(x)] = p(x)l(x) = pi li
Expected Code Length: Examples
Example: X has AX = {a,b,c,d} and P = {12, 41, 18, 81} 1 The code C1 = {0001, 0010, 0100, 1000} has
4 L(C1,X)=pili =4
Expected Code Length: Examples
Example: X has AX = {a,b,c,d} and P = {12, 41, 18, 81}
1 The code C1 = {0001, 0010, 0100, 1000} has
4 L(C1,X)=pili =4
2 The code C2 = {0, 10, 110, 111} has
L(C2,X)=pili =12 ×1+14 ×2+81 ×3+18 ×3=2.25
Code Lengths and Probabilities
The Kraft inequality says that {l1 , . . . , lI } are prefix code lengths iff I
If it were true that
2−li ≤1 i=1
then we could interpret
q = (2−l1,…,2−lI ) as a probability vector over I outcomes
General lengths l?
Code Lengths and Probabilities
Probabilities from Code Lengths
Given code lengths l = {l1,…,lI} such that Ii=1 2−li ≤ 1, we define q = {q1,…,qI}, the probabilities for l, by
z = 2−li i
ensure that qi satisfy i qi = 1.
Note: this implies li = log2 1 zqi
Code Lengths and Probabilities
Probabilities from Code Lengths
Given code lengths l = {l1,…,lI} such that Ii=1 2−li ≤ 1, we define q = {q1,…,qI}, the probabilities for l, by
z = 2−li i
ensure that qi satisfy i qi = 1. Note: this implies li = log2 1
1 Lengths{1,2,2}givez =1soq1 = 12,q2 = 41,andq3 = 14
Code Lengths and Probabilities
Probabilities from Code Lengths
Given code lengths l = {l1,…,lI} such that Ii=1 2−li ≤ 1, we define q = {q1,…,qI}, the probabilities for l, by
z = 2−li i
ensure that qi satisfy i qi = 1. Note: this implies li = log2 1
1 Lengths{1,2,2}givez =1soq1 = 12,q2 = 41,andq3 = 14
2 Lengths{2,2,3}givez = 58 soq1 = 25,q2 = 25,andq3 = 15
Minimising Expected Code Length
The probability view of lengths will be useful in answering:
Goal of compression
Given an ensemble X with probabilities PX = p = {p1, . . . , pI } how can we minimise the expected code length?
Minimising Expected Code Length
The probability view of lengths will be useful in answering:
Goal of compression
Given an ensemble X with probabilities PX = p = {p1, . . . , pI } how can we minimise the expected code length?
In particular, we can relate the expected code length to the relative entropy (KL divergence) between p, q:
Minimising Expected Code Length
The probability view of lengths will be useful in answering:
Goal of compression
Given an ensemble X with probabilities PX = p = {p1, . . . , pI } how can we minimise the expected code length?
In particular, we can relate the expected code length to the relative entropy (KL divergence) between p, q:
Limits of compression
Given an ensemble X with probabilities p, and prefix code C with codeword length probabilities q and normalisation z,
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
with equality only when li = log2 p1 . i
Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = pili i
Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = pili i
pi log2 zqi
Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = pili i
pi log2 pi log2
1 pi zpi qi
Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = pili i
pi log2 pi log2
1 pi zpi qi
= pi log2 p +log2 q +log2 z iii
pi 1
Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = pili i
1 pi 1 = pi log2 p + pi log2 q + log2 z pi
pi log2 pi log2
1 pi zpi qi
pi log2 p +log2 q +log2 z
pi 1 iii
Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = = = = =
pi log2 pi log2
1 pi zpi qi
pi log2 p +log2 q +log2 z
pi 1 iii
1 pi 1 pi log2 p + pi log2 q + log2 z pi
iiiii H(X) + DKL(p∥q) + log2(1/z) · 1
Minimising Expected Code Length
So if q = {q1,…,qI} are the probabilities for the code lengths of C then under ensemble X with probabilities p = {p1 , . . . , pI }
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
Minimising Expected Code Length
So if q = {q1,…,qI} are the probabilities for the code lengths of C then under ensemble X with probabilities p = {p1 , . . . , pI }
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
Thus, L(C,X) is minimal (and equal to the entropy H(X)) if we can choose code lengths so that DKL(p∥q) = 0 and log2 z1 = 0
Minimising Expected Code Length
So if q = {q1,…,qI} are the probabilities for the code lengths of C then under ensemble X with probabilities p = {p1 , . . . , pI }
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
Thus, L(C,X) is minimal (and equal to the entropy H(X)) if we can choose code lengths so that DKL(p∥q) = 0 and log2 z1 = 0
But the relative entropy DKL(p∥q) ≥ 0 with DKL(p∥q) = 0 iff q = p (Gibb’s inequality)
Minimising Expected Code Length
So if q = {q1,…,qI} are the probabilities for the code lengths of C then under ensemble X with probabilities p = {p1 , . . . , pI }
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
Thus, L(C,X) is minimal (and equal to the entropy H(X)) if we can choose code lengths so that DKL(p∥q) = 0 and log2 z1 = 0
But the relative entropy DKL(p∥q) ≥ 0 with DKL(p∥q) = 0 iff q = p (Gibb’s inequality)
def 1 Forq=p,wehavez = i qi = i pi =1andsolog2 z =0
Entropy as a Lower Bound on Expected Length
We have shown that for a code C with lengths corresponding to q, L(C,X) ≥ H(X)
with equality only when C has code lengths li = log2 p1 i
Once again, the entropy determines a lower bound on how much compression is possible
L(C,X) refers to average compression
Individual message length could be bigger than the entropy
If we pick lengths li = log2 p1 , we get optimal expected code lengths i
But log2 p1 is not always an integer—a problem for code lengths! i
If we pick lengths li = log2 p1 , we get optimal expected code lengths i
But log2 p1 is not always an integer—a problem for code lengths! i
Given an ensemble X with PX = {p1 , . . . , pI } define codelengths
l = {l1,…,lI} by
A code C is called a Shannon code if it has codelengths l.
Here ⌈x⌉ is “smallest integer not smaller than x”. e.g., ⌈2.1⌉ = 3, ⌈5⌉ = 5.
This gives us code lengths that are “closest” to log2 p1 i
11 li= log2p ≥log2p.
: Examples
1 If PX = {21, 14, 41} then l = {1,2,2} so C = {0,10,11} is a Shannon code (in fact, this code has optimal length)
: Examples
1 If PX = {21, 14, 41} then l = {1,2,2} so C = {0,10,11} is a Shannon
code (in fact, this code has optimal length)
2 If PX = {31, 13, 31} then l = {2,2,2} with Shannon code C = {00,10,11} (or C = {01,10,11} … )
Source Coding Theorem for Symbol Codes
Shannon codes let us prove the following:
Source Coding Theorem for Symbol Codes
Shannon codes let us prove the following:
Source Coding Theorem for Symbol Codes
For any ensemble X , there exists a prefix code C such that H (X ) ≤ L(C , X )< H (X ) + 1.
In particular, Shannon codes C — those with lengths li = log2 p1 — i
have expected code length within 1 bit of the entropy.
Source Coding Theorem for Symbol Codes
Shannon codes let us prove the following:
Source Coding Theorem for Symbol Codes
For any ensemble X , there exists a prefix code C such that H (X ) ≤ L(C , X )< H (X ) + 1.
In particular, Shannon codes C — those with lengths li = log2 p1 — i
have expected code length within 1 bit of the entropy. Entropy also gives a guideline upper bound of compression
Since ⌈x⌉ is the smallest integer bigger than or equal to x it must be the case that x ≤ ⌈x⌉ < x + 1.
Since ⌈x⌉ is the smallest integer bigger than or equal to x it must be the case that x ≤ ⌈x⌉ < x + 1.
Therefore, if we create a Shannon code C for p = {p1,...,pI} with li =log2 p1