COMP2610 / 6261 — Information Theory Lecture 13: Symbol Codes for Lossless Compression
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:
Copyright By PowCoder代写 加微信 powcoder
earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
Research School of Computer Science The Australian National University
e used white reversed out of a black
Preferred logo
Black version
17 September, 2018
Proof of the source coding theorem Foundational theorem, but impractical
Requires potentially very large block sizes
The theorem also only considers uniform coding schemes
Could variable length coding help?
Does entropy turn up for such codes as well?
Variable-length codes Prefix codes
Kraft’s inequality
1 Variable-Length Codes Unique Decodeability
Prefix Codes
1 Variable-Length Codes Unique Decodeability
Prefix Codes
Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Binary Symbol Code
Let X be an ensemble with AX = {a1,…,aI}. Afunctionc :AX →{0,1}+ isacodeforX.
The binary string c(x) is the codeword for x ∈ AX
Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Binary Symbol Code
Let X be an ensemble with AX = {a1,…,aI}. Afunctionc :AX →{0,1}+ isacodeforX.
The binary string c(x) is the codeword for x ∈ AX The length of the codeword for for x is denoted l(x). Shorthand: li = l(ai) for i = 1…,I.
Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Binary Symbol Code
Let X be an ensemble with AX = {a1,…,aI}. Afunctionc :AX →{0,1}+ isacodeforX.
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted l(x).
Shorthand: li = l(ai) for i = 1…,I.
The extension of c assigns codewords to any sequence x1x2 . . . xN from A+ by c(x1 …xN) = c(x1)…c(xN)
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d} Example 1 (Uniform Code):
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000}
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111 Shorthand: C2 = {0, 10, 110, 111}
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111 Shorthand: C2 = {0, 10, 110, 111} Inthiscasel1 =1,l2 =2,l3 =l4 =3
Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111 Shorthand: C2 = {0, 10, 110, 111}
Inthiscasel1 =1,l2 =2,l3 =l4 =3
The extension of c maps aba ∈ A3X ⊂ A+X to 0100
Unique Decodeability Recallthatacodeislosslessifforallx,y ∈AX
x ̸= y =⇒ c(x) ̸= c(y)
This ensures that if we work with a single outcome, we can uniquely
decode the outcome
When working with variable-length codes, it will be convenient to also require the following:
Unique Decodeability Recallthatacodeislosslessifforallx,y ∈AX
x ̸= y =⇒ c(x) ̸= c(y)
This ensures that if we work with a single outcome, we can uniquely
decode the outcome
When working with variable-length codes, it will be convenient to also require the following:
Uniquely Decodable
A code c for X is uniquely decodable if no two strings from A+X have the same codeword. That is, for all x, y ∈ A+X
x̸=y =⇒ c(x)̸=c(y)
This ensures that if we work with a sequence of outcomes, we can still uniquely decode the individual elements
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110 The code is of course lossless
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
The code is of course lossless
Lossless Uniquely decodable
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
The code is of course lossless
Lossless Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
The code is of course lossless
Lossless Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
We can easily segment a given code string scanning left to right
Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
The code is of course lossless
Lossless Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
We can easily segment a given code string scanning left to right
e.g. 0110010 → 0, 110, 0, 10
“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property
“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of string
“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of string
Once our current segment matches a codeword, no ambiguity to resolve Why? No codeword is a prefix of any other
Not true for every uniquely decodable code, e.g. C4 = {0, 01, 011} First bit 0 → no certainty what the symbol is
Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c? Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c? Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1,…,cI} is a prefix code if for every codeword ci ∈ C there is no prefix of ci in C.
In a stream, no confusing one codeword with another
Prefix Codes: Examples
C1 = {0001, 0010, 0100, 1000} is prefix-free
C2 = {0, 10, 110, 111} is prefix-free
C2′ = {1, 10, 110, 111} is not prefix free since c3 = 110 = c1c2
C′′ = {1, 01, 110, 111} is not prefix free since c3 = 110 = c110 2
Prefix Codes as Trees
C1 ={0001,0010,0100,1000}
The total symbol code budget
Prefix Codes as Trees
C2 ={0,10,110,111}
The total symbol code budget
Prefix Codes as Trees
C2′ ={1,10,110,111}
Each codeword choice eliminates its descendants
The total symbol code budget
Prefix Codes are Uniquely Decodeable
000 0000 00 0001 001 0010 0011 0100 0101 0110 0111 100 1000 1001 101 1010 1011 1100 1101 1110 1111
If l∗ = max{l ,…,l } then symbol is 1I
decodeable after seeing at most l∗ bits
ConsiderC2 ={0,10,110,111}
Ifc(x)=0…thenx =a 1
Ifc(x)=10…thenx1 =b
If c(x) = 1… then x1 ∈ {b,c,d}
If c(x) = 11… then x 1
The total symbol code budget
Uniquely Decodeable Codes are Not Always Prefix Codes
A uniquely decodeable code is not necessarily a prefix code Example: C1 = {0, 01, 011}
00 . . . → first codeword
010 . . . → second codeword 011 . . . → third codeword
Example: C2 = {0, 01, 011, 111} ThisisthereverseoftheprefixcodeC2′ ={0,10,110,111}
Relating various types of codes
All codes Lossless
Uniquely decodable
Note that e.g.
Prefix =⇒ Uniquely Decodable
Not Prefix ≠ ⇒ Not Uniquely Decodable
Why prefix codes?
While prefix codes do not represent all uniquely decodable codes, they are convenient to work with
It will be easy to generate prefix codes (Huffman coding, next lecture)
Further, we can quickly establish if a given code is not prefix Testing for unique decodability is non-trivial in general
1 Variable-Length Codes Unique Decodeability
Prefix Codes
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4,4,4,4}
L2 = {1,2,3,3}
L3 = {2,2,3,4,4} L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3}
L3 = {2,2,3,4,4}
L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4}
L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 ={2,2,3,4,4}—C3 ={00, , , , } L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01, , , } L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100, , } L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100,1010, } L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100,1010,1011} L4 = {1,3,3,3,3,4}
The total symbol code budget
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100,1010,1011} L4 ={1,3,3,3,3,4}—Impossible!
The total symbol code budget
The a.k.a. The Kraft-Mc
For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code C with those codeword lengths.
The a.k.a. The Kraft-Mc
For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code
C with those codeword lengths.
1 C1 = {0001, 0010, 0100, 1000} is prefix and 4i=1 2−4 = 14 ≤ 1
The a.k.a. The Kraft-Mc
For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code
C with those codeword lengths. Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and 4i=1 2−4 = 14 ≤ 1
2 C2 ={0,10,110,111}isprefixand4i=12−li =21 +14 +28 =1
The a.k.a. The Kraft-Mc
For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code
C with those codeword lengths. Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and 4i=1 2−4 = 14 ≤ 1
2 C2 ={0,10,110,111}isprefixand4i=12−li =21 +14 +28 =1
3 Lengths {1,2,2,3} give 4i=1 2−li = 21 + 24 + 18 > 1 so no prefix code
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
2 x 2-bit codewords: {00, 01}
The total symbol code budget
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
The total symbol code budget
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com