CS代考 COMP2610 / 6261 - Information Theory Lecture 14: Source Coding Theorem for

COMP2610 / 6261 – Information Theory Lecture 14: Source Coding Theorem for Symbol Codes
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
Research School of Computer Science The Australian National University
e used white reversed out of a black
Preferred logo
Black version
17 September, 2018

Variable-length codes
Uniquely decodable and prefix codes Prefix codes as trees
Kraft’s inequality:
Lengths {li}Ii=1 can form a prefix code ⇐⇒ 􏰐Ii=1 2−i ≤ 1
How to generate prefix codes?

Prefix Codes (Recap)
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c? Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1,…,cI} is a prefix code if for every codeword ci ∈ C there is no prefix of ci in C.
In a stream, no confusing one codeword with another

Prefix Codes as Trees (Recap)
C2 ={0,10,110,111}
The total symbol code budget

Bound on expected length for a prefix code Shannon codes
Huffman coding

1 Expected Code Length
Minimising Expected Code Length Shannon Coding
2 The Source Coding Theorem for Symbol Codes
Algorithm and Examples Advantages and Disadvantages

Expected Code Length
With uniform codes, the length of a message of N outcomes is trivial to compute
With variable-length codes, the length of a message of N outcomes will depend on the outcomes we observe
Outcomes we observe have some uncertainty
On average, what length of message can we expect?
Expected Code Length
The expected length for a code C for ensemble X with AX = {a1,…,aI} and PX = {p1,…,pI} is
I L(C,X) = E[l(x)] = 􏰄 p(x)l(x) = 􏰄pi li

Expected Code Length: Examples
Example: X has AX = {a,b,c,d} and P = {12, 41, 18, 81} 1 The code C1 = {0001, 0010, 0100, 1000} has
4 L(C1,X)=􏰄pili =4

Expected Code Length: Examples
Example: X has AX = {a,b,c,d} and P = {12, 41, 18, 81}
1 The code C1 = {0001, 0010, 0100, 1000} has
4 L(C1,X)=􏰄pili =4
2 The code C2 = {0, 10, 110, 111} has
L(C2,X)=􏰄pili =12 ×1+14 ×2+81 ×3+18 ×3=2.25

Code Lengths and Probabilities
The Kraft inequality says that {l1 , . . . , lI } are prefix code lengths iff I
If it were true that
􏰄2−li ≤1 i=1
then we could interpret
q = (2−l1,…,2−lI ) as a probability vector over I outcomes
General lengths l?

Code Lengths and Probabilities
Probabilities from Code Lengths
Given code lengths l = {l1,…,lI} such that 􏰐Ii=1 2−li ≤ 1, we define q = {q1,…,qI}, the probabilities for l, by
z = 􏰄 2−li i
ensure that qi satisfy 􏰐i qi = 1.
Note: this implies li = log2 1 zqi

Minimising Expected Code Length
The probability view of lengths will be useful in answering:
Goal of compression
Given an ensemble X with probabilities PX = p = {p1, . . . , pI } how can we minimise the expected code length?
In particular, we can relate the expected code length to the relative entropy (KL divergence) between p, q:
Limits of compression
Given an ensemble X with probabilities p, and prefix code C with codeword length probabilities q and normalisation z,
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
with equality only when li = log2 p1 . i

Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = 􏰄pili i

Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = 􏰄pili i
pi log2 zqi

Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = 􏰄pili i
pi log2 pi log2
􏰂 1 pi 􏰃 zpi qi

Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = 􏰄pili i
􏰄 1􏰄 pi 􏰂1􏰃􏰄 = pi log2 p + pi log2 q + log2 z pi
pi log2 pi log2
􏰂 1 pi 􏰃 zpi qi
pi log2 p +log2 q +log2 z
􏰂pi 􏰃 􏰂1􏰃􏰌 iii

Minimising Expected Code Length
Suppose we use code C with lengths l = {l1 , . . . , lI } and corresponding probabilities q = {q1,…,qI} with qi = z12−li . Then,
L(C,X) = = = = =
pi log2 pi log2
􏰂 1 pi 􏰃 zpi qi
pi log2 p +log2 q +log2 z
􏰂pi 􏰃 􏰂1􏰃􏰌 iii
􏰄 1􏰄 pi 􏰂1􏰃􏰄 pi log2 p + pi log2 q + log2 z pi
iiiii H(X) + DKL(p∥q) + log2(1/z) · 1

Minimising Expected Code Length
So if q = {q1,…,qI} are the probabilities for the code lengths of C then under ensemble X with probabilities p = {p1 , . . . , pI }
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z

Minimising Expected Code Length
So if q = {q1,…,qI} are the probabilities for the code lengths of C then under ensemble X with probabilities p = {p1 , . . . , pI }
L(C,X)=H(X)+D (p∥q)+log 1 KL 2z
Thus, L(C,X) is minimal (and equal to the entropy H(X)) if we can choose code lengths so that DKL(p∥q) = 0 and log2 z1 = 0
But the relative entropy DKL(p∥q) ≥ 0 with DKL(p∥q) = 0 iff q = p (Gibb’s inequality)

Entropy as a Lower Bound on Expected Length
We have shown that for a code C with lengths corresponding to q, L(C,X) ≥ H(X)
with equality only when C has code lengths li = log2 p1 i
Once again, the entropy determines a lower bound on how much compression is possible
L(C,X) refers to average compression
Individual message length could be bigger than the entropy

If we pick lengths li = log2 p1 , we get optimal expected code lengths i
But log2 p1 is not always an integer—a problem for code lengths! i

Given an ensemble X with PX = {p1 , . . . , pI } define codelengths
l = {l1,…,lI} by
A code C is called a Shannon code if it has codelengths l.
Here ⌈x⌉ is “smallest integer not smaller than x”. e.g., ⌈2.1⌉ = 3, ⌈5⌉ = 5.
This gives us code lengths that are “closest” to log2 p1 i
􏱆1􏱇1 li= log2p ≥log2p.

: Examples
1 If PX = {21, 14, 41} then l = {1,2,2} so C = {0,10,11} is a Shannon code (in fact, this code has optimal length)

: Examples
1 If PX = {21, 14, 41} then l = {1,2,2} so C = {0,10,11} is a Shannon
code (in fact, this code has optimal length)
2 If PX = {31, 13, 31} then l = {2,2,2} with Shannon code C = {00,10,11} (or C = {01,10,11} … )

Source Coding Theorem for Symbol Codes
Shannon codes let us prove the following:

Source Coding Theorem for Symbol Codes
Shannon codes let us prove the following:
Source Coding Theorem for Symbol Codes
For any ensemble X , there exists a prefix code C such that H (X ) ≤ L(C , X )< H (X ) + 1. In particular, Shannon codes C — those with lengths li = 􏰅log2 p1 􏰆 — i have expected code length within 1 bit of the entropy. Source Coding Theorem for Symbol Codes Shannon codes let us prove the following: Source Coding Theorem for Symbol Codes For any ensemble X , there exists a prefix code C such that H (X ) ≤ L(C , X )< H (X ) + 1. In particular, Shannon codes C — those with lengths li = 􏰅log2 p1 􏰆 — i have expected code length within 1 bit of the entropy. Entropy also gives a guideline upper bound of compression Since ⌈x⌉ is the smallest integer bigger than or equal to x it must be the case that x ≤ ⌈x⌉ < x + 1. Since ⌈x⌉ is the smallest integer bigger than or equal to x it must be the case that x ≤ ⌈x⌉ < x + 1. Therefore, if we create a Shannon code C for p = {p1,...,pI} with li =􏰅log2 p1􏰆CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts