CS代写 COMP2610 / 6261 — Information Theory Lecture 13: Symbol Codes for Lossless

COMP2610 / 6261 — Information Theory Lecture 13: Symbol Codes for Lossless Compression
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

Copyright By PowCoder代写 加微信 powcoder

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
Research School of Computer Science The Australian National University
e used white reversed out of a black
Preferred logo
Black version
17 September, 2018

Proof of the source coding theorem Foundational theorem, but impractical
Requires potentially very large block sizes
The theorem also only considers uniform coding schemes
Could variable length coding help?
Does entropy turn up for such codes as well?

Variable-length codes Prefix codes
Kraft’s inequality

1 Variable-Length Codes Unique Decodeability
Prefix Codes

1 Variable-Length Codes Unique Decodeability
Prefix Codes

Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = 􏱩N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}

Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = 􏱩N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Binary Symbol Code
Let X be an ensemble with AX = {a1,…,aI}. Afunctionc :AX →{0,1}+ isacodeforX.
The binary string c(x) is the codeword for x ∈ AX

Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = 􏱩N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Binary Symbol Code
Let X be an ensemble with AX = {a1,…,aI}. Afunctionc :AX →{0,1}+ isacodeforX.
The binary string c(x) is the codeword for x ∈ AX The length of the codeword for for x is denoted l(x). Shorthand: li = l(ai) for i = 1…,I.

Codes: A Review
If A is a finite set then AN is the set of all strings of length N. A+ = 􏱩N AN is the set of all finite strings
{0,1}3 ={000,001,010,011,100,101,110,111} {0,1}+ = {0,1,00,01,10,11,000,001,010,…}
Binary Symbol Code
Let X be an ensemble with AX = {a1,…,aI}. Afunctionc :AX →{0,1}+ isacodeforX.
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted l(x).
Shorthand: li = l(ai) for i = 1…,I.
The extension of c assigns codewords to any sequence x1x2 . . . xN from A+ by c(x1 …xN) = c(x1)…c(xN)

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d} Example 1 (Uniform Code):

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000}

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111 Shorthand: C2 = {0, 10, 110, 111}

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111 Shorthand: C2 = {0, 10, 110, 111} Inthiscasel1 =1,l2 =2,l3 =l4 =3

Codes: A Review Examples
X is an ensemble with AX = {a,b,c,d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000 Shorthand: C1 = {0001, 0010, 0100, 1000} Allcodewordshavelength4. Thatis,l1 =l2 =l3 =l4 =4 The extension of c maps aba ∈ A3X ⊂ A+X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111 Shorthand: C2 = {0, 10, 110, 111}
Inthiscasel1 =1,l2 =2,l3 =l4 =3
The extension of c maps aba ∈ A3X ⊂ A+X to 0100

Unique Decodeability Recallthatacodeislosslessifforallx,y ∈AX
x ̸= y =⇒ c(x) ̸= c(y)
This ensures that if we work with a single outcome, we can uniquely
decode the outcome
When working with variable-length codes, it will be convenient to also require the following:

Unique Decodeability Recallthatacodeislosslessifforallx,y ∈AX
x ̸= y =⇒ c(x) ̸= c(y)
This ensures that if we work with a single outcome, we can uniquely
decode the outcome
When working with variable-length codes, it will be convenient to also require the following:
Uniquely Decodable
A code c for X is uniquely decodable if no two strings from A+X have the same codeword. That is, for all x, y ∈ A+X
x̸=y =⇒ c(x)̸=c(y)
This ensures that if we work with a sequence of outcomes, we can still uniquely decode the individual elements

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110 􏰜 The code is of course lossless

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
􏰜 The code is of course lossless
􏰜 Lossless 􏱪 Uniquely decodable

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
􏰜 The code is of course lossless
􏰜 Lossless 􏱪 Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
􏰜 The code is of course lossless
􏰜 Lossless 􏱪 Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
􏰜 We can easily segment a given code string scanning left to right

Examples of uniquely decodable codes
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
􏰜 Uniform + Lossless ⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
􏰜 The code is of course lossless
􏰜 Lossless 􏱪 Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
􏰜 We can easily segment a given code string scanning left to right
􏰜 e.g. 0110010 → 0, 110, 0, 10

“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property

“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of string

“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of string
Once our current segment matches a codeword, no ambiguity to resolve Why? No codeword is a prefix of any other
Not true for every uniquely decodable code, e.g. C4 = {0, 01, 011} First bit 0 → no certainty what the symbol is

Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.

Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c? Example: 01101 has prefixes 0, 01, 011, 0110.

Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword c′ ∈ {0,1}+ if there exists a string t ∈ {0,1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c? Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1,…,cI} is a prefix code if for every codeword ci ∈ C there is no prefix of ci in C.
In a stream, no confusing one codeword with another

Prefix Codes: Examples
C1 = {0001, 0010, 0100, 1000} is prefix-free
C2 = {0, 10, 110, 111} is prefix-free
C2′ = {1, 10, 110, 111} is not prefix free since c3 = 110 = c1c2
C′′ = {1, 01, 110, 111} is not prefix free since c3 = 110 = c110 2

Prefix Codes as Trees
C1 ={0001,0010,0100,1000}
The total symbol code budget

Prefix Codes as Trees
C2 ={0,10,110,111}
The total symbol code budget

Prefix Codes as Trees
C2′ ={1,10,110,111}
Each codeword choice eliminates its descendants
The total symbol code budget

Prefix Codes are Uniquely Decodeable
000 0000 00 0001 001 0010 0011 0100 0101 0110 0111 100 1000 1001 101 1010 1011 1100 1101 1110 1111
If l∗ = max{l ,…,l } then symbol is 1I
decodeable after seeing at most l∗ bits
ConsiderC2 ={0,10,110,111}
􏰜 Ifc(x)=0…thenx =a 1
􏰜 Ifc(x)=10…thenx1 =b
If c(x) = 1… then x1 ∈ {b,c,d}
If c(x) = 11… then x 1
The total symbol code budget

Uniquely Decodeable Codes are Not Always Prefix Codes
A uniquely decodeable code is not necessarily a prefix code Example: C1 = {0, 01, 011}
00 . . . → first codeword
010 . . . → second codeword 011 . . . → third codeword
Example: C2 = {0, 01, 011, 111} ThisisthereverseoftheprefixcodeC2′ ={0,10,110,111}

Relating various types of codes
All codes Lossless
Uniquely decodable
Note that e.g.
Prefix =⇒ Uniquely Decodable
Not Prefix ≠ ⇒ Not Uniquely Decodable

Why prefix codes?
While prefix codes do not represent all uniquely decodable codes, they are convenient to work with
It will be easy to generate prefix codes (Huffman coding, next lecture)
Further, we can quickly establish if a given code is not prefix Testing for unique decodability is non-trivial in general

1 Variable-Length Codes Unique Decodeability
Prefix Codes

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4,4,4,4}
L2 = {1,2,3,3}
L3 = {2,2,3,4,4} L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3}
L3 = {2,2,3,4,4}
L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4}
L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 ={2,2,3,4,4}—C3 ={00, , , , } L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01, , , } L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100, , } L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100,1010, } L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100,1010,1011} L4 = {1,3,3,3,3,4}
The total symbol code budget

Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 ={4,4,4,4}—C1 ={0001,0010,0100,1000} L2 = {1,2,3,3} — C2 = {0,10,110,111}
L3 = {2,2,3,4,4} — C3 = {00,01,100,1010,1011} L4 ={1,3,3,3,3,4}—Impossible!
The total symbol code budget

The a.k.a. The Kraft-Mc

For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
􏰄2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code C with those codeword lengths.

The a.k.a. The Kraft-Mc

For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
􏰄2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code
C with those codeword lengths.
1 C1 = {0001, 0010, 0100, 1000} is prefix and 􏰐4i=1 2−4 = 14 ≤ 1

The a.k.a. The Kraft-Mc

For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
􏰄2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code
C with those codeword lengths. Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and 􏰐4i=1 2−4 = 14 ≤ 1
2 C2 ={0,10,110,111}isprefixand􏰐4i=12−li =21 +14 +28 =1

The a.k.a. The Kraft-Mc

For any prefix (binary) code C, its codeword lengths {l1, . . . , lI } satisfy
􏰄2−li ≤1 (1)
Conversely, if the set {l1 , . . . , lI } satisfy (1) then there exists a prefix code
C with those codeword lengths. Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and 􏰐4i=1 2−4 = 14 ≤ 1
2 C2 ={0,10,110,111}isprefixand􏰐4i=12−li =21 +14 +28 =1
3 Lengths {1,2,2,3} give 􏰐4i=1 2−li = 21 + 24 + 18 > 1 so no prefix code

Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
2 x 2-bit codewords: {00, 01}
The total symbol code budget

Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
The total symbol code budget

Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com