COMP2610 / 6261 — Information Theory – Lecture 13: Symbol Codes for Lossless Compression
COMP2610 / 6261 — Information Theory
Lecture 13: Symbol Codes for Lossless Compression
Robert C. Williamson
Research School of Computer Science
The Australian National University
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
17 September, 2018
1 / 25
Last time
Proof of the source coding theorem
Foundational theorem, but impractical
Requires potentially very large block sizes
The theorem also only considers uniform coding schemes
Could variable length coding help?
Does entropy turn up for such codes as well?
2 / 25
This time
Variable-length codes
Prefix codes
Kraft’s inequality
3 / 25
1 Variable-Length Codes
Unique Decodeability
Prefix Codes
2 The Kraft Inequality
3 Summary
4 / 25
1 Variable-Length Codes
Unique Decodeability
Prefix Codes
2 The Kraft Inequality
3 Summary
5 / 25
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
A+ =
⋃
N A
N is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1, . . . , aI}.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted `(x).
Shorthand: `i = `(ai) for i = 1 . . . , I.
The extension of c assigns codewords to any sequence x1x2 . . . xN
from A+ by c(x1 . . . xN) = c(x1) . . . c(xN)
6 / 25
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
A+ =
⋃
N A
N is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1, . . . , aI}.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted `(x).
Shorthand: `i = `(ai) for i = 1 . . . , I.
The extension of c assigns codewords to any sequence x1x2 . . . xN
from A+ by c(x1 . . . xN) = c(x1) . . . c(xN)
6 / 25
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
A+ =
⋃
N A
N is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1, . . . , aI}.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted `(x).
Shorthand: `i = `(ai) for i = 1 . . . , I.
The extension of c assigns codewords to any sequence x1x2 . . . xN
from A+ by c(x1 . . . xN) = c(x1) . . . c(xN)
6 / 25
Codes: A Review
Notation:
If A is a finite set then AN is the set of all strings of length N.
A+ =
⋃
N A
N is the set of all finite strings
Examples:
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111}
{0, 1}+ = {0, 1, 00, 01, 10, 11, 000, 001, 010, . . .}
Binary Symbol Code
Let X be an ensemble with AX = {a1, . . . , aI}.
A function c : AX → {0, 1}+ is a code for X .
The binary string c(x) is the codeword for x ∈ AX
The length of the codeword for for x is denoted `(x).
Shorthand: `i = `(ai) for i = 1 . . . , I.
The extension of c assigns codewords to any sequence x1x2 . . . xN
from A+ by c(x1 . . . xN) = c(x1) . . . c(xN)
6 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Codes: A Review
Examples
X is an ensemble with AX = {a, b, c, d}
Example 1 (Uniform Code):
Let c(a) = 0001, c(b) = 0010, c(c) = 0100, c(d) = 1000
Shorthand: C1 = {0001, 0010, 0100, 1000}
All codewords have length 4. That is, `1 = `2 = `3 = `4 = 4
The extension of c maps aba ∈ A3X ⊂ A
+
X to 000100100001
Example 2 (Variable-Length Code):
Let c(a) = 0, c(b) = 10, c(c) = 110, c(d) = 111
Shorthand: C2 = {0, 10, 110, 111}
In this case `1 = 1, `2 = 2, `3 = `4 = 3
The extension of c maps aba ∈ A3X ⊂ A
+
X to 0100
7 / 25
Unique Decodeability
Recall that a code is lossless if for all x , y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
This ensures that if we work with a single outcome, we can uniquely
decode the outcome
When working with variable-length codes, it will be convenient to also
require the following:
Uniquely Decodable
A code c for X is uniquely decodable if no two strings from A+X have the
same codeword. That is, for all x, y ∈ A+X
x 6= y =⇒ c(x) 6= c(y)
This ensures that if we work with a sequence of outcomes, we can still
uniquely decode the individual elements
8 / 25
Unique Decodeability
Recall that a code is lossless if for all x , y ∈ AX
x 6= y =⇒ c(x) 6= c(y)
This ensures that if we work with a single outcome, we can uniquely
decode the outcome
When working with variable-length codes, it will be convenient to also
require the following:
Uniquely Decodable
A code c for X is uniquely decodable if no two strings from A+X have the
same codeword. That is, for all x, y ∈ A+X
x 6= y =⇒ c(x) 6= c(y)
This ensures that if we work with a sequence of outcomes, we can still
uniquely decode the individual elements
8 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
Examples of uniquely decodable codes
Examples:
C1 = {0001, 0010, 0100, 1000} is uniquely decodable
I Uniform + Lossless⇒ Uniquely decodable
C2 = {1, 10, 110, 111} is not uniquely decodable because
c(aaa) = c(d) = 111 and c(ab) = c(c) = 110
I The code is of course lossless
I Lossless ; Uniquely decodable
C3 = {0, 10, 110, 111} is uniquely decodable
I We can easily segment a given code string scanning left to right
I e.g. 0110010→ 0, 110, 0, 10
9 / 25
“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property
Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of
string
Once our current segment matches a codeword, no ambiguity to resolve
Why? No codeword is a prefix of any other
Not true for every uniquely decodable code, e.g. C4 = {0, 01, 011}
First bit 0→ no certainty what the symbol is
10 / 25
“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property
Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of
string
Once our current segment matches a codeword, no ambiguity to resolve
Why? No codeword is a prefix of any other
Not true for every uniquely decodable code, e.g. C4 = {0, 01, 011}
First bit 0→ no certainty what the symbol is
10 / 25
“Self-punctuating” property
The code C3 = {0, 10, 110, 111} has a “self-punctuating” property
Trivial to segment a given code string into individual codewords
Keep scanning until we match a codeword
Once matched, add new segment boundary, and proceed to rest of
string
Once our current segment matches a codeword, no ambiguity to resolve
Why? No codeword is a prefix of any other
Not true for every uniquely decodable code, e.g. C4 = {0, 01, 011}
First bit 0→ no certainty what the symbol is
10 / 25
Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability
Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c′ ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c?
Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1, . . . , cI} is a prefix code if for every codeword ci ∈ C
there is no prefix of ci in C.
In a stream, no confusing one codeword with another
11 / 25
Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability
Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c′ ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c?
Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1, . . . , cI} is a prefix code if for every codeword ci ∈ C
there is no prefix of ci in C.
In a stream, no confusing one codeword with another
11 / 25
Prefix Codes
a.k.a prefix-free or instantaneous codes
A simple property of codes guarantees unique decodeability
Prefix property
A codeword c ∈ {0, 1}+ is said to be a prefix of another codeword
c′ ∈ {0, 1}+ if there exists a string t ∈ {0, 1}+ such that c′ = ct.
Can you create c′ by gluing something to the end of c?
Example: 01101 has prefixes 0, 01, 011, 0110.
Prefix Codes
A code C = {c1, . . . , cI} is a prefix code if for every codeword ci ∈ C
there is no prefix of ci in C.
In a stream, no confusing one codeword with another
11 / 25
Prefix Codes: Examples
Examples:
C1 = {0001, 0010, 0100, 1000} is prefix-free
C2 = {0, 10, 110, 111} is prefix-free
C′2 = {1, 10, 110, 111} is not prefix free since c3 = 110 = c1c2
C′′2 = {1, 01, 110, 111} is not prefix free since c3 = 110 = c110
12 / 25
Prefix Codes as Trees
C1 = {0001, 0010, 0100, 1000}
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
13 / 25
Prefix Codes as Trees
C2 = {0, 10, 110, 111}
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
13 / 25
Prefix Codes as Trees
C′2 = {1, 10, 110, 111}
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
Each codeword choice eliminates its descendants 13 / 25
Prefix Codes are Uniquely Decodeable
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et If `
∗ = max{`1, . . . , `I} then symbol is
decodeable after seeing at most `∗
bits
Consider C2 = {0, 10, 110, 111}
I If c(x) = 0 . . . then x1 = a
I If c(x) = 1 . . . then x1 ∈ {b, c, d}
I If c(x) = 10 . . . then x1 = b
I If c(x) = 11 . . . then x1 ∈ {c, d}
14 / 25
Uniquely Decodeable Codes are Not Always Prefix
Codes
A uniquely decodeable code is not necessarily a prefix code
Example: C1 = {0, 01, 011}
00 . . .→ first codeword
010 . . .→ second codeword
011 . . .→ third codeword
Example: C2 = {0, 01, 011, 111}
This is the reverse of the prefix code C′2 = {0, 10, 110, 111}
15 / 25
Relating various types of codes
All codes
Lossless
Uniquely
decodable
Prefix
Note that e.g.
Prefix =⇒ Uniquely Decodable
but
Not Prefix 6=⇒ Not Uniquely Decodable
16 / 25
Why prefix codes?
While prefix codes do not represent all uniquely decodable codes, they
are convenient to work with
It will be easy to generate prefix codes (Huffman coding, next lecture)
Further, we can quickly establish if a given code is not prefix
Testing for unique decodability is non-trivial in general
17 / 25
1 Variable-Length Codes
Unique Decodeability
Prefix Codes
2 The Kraft Inequality
3 Summary
18 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}
— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}
— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}
— C3 = {
00
,
01
,
100
,
1010
,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}
— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}
— C3 = {
00
,
01
,
100
,
1010
,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}
— C3 = {
00
,
01
,
100
,
1010
,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}— C3 = {00,
01
,
100
,
1010
,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}— C3 = {00, 01,
100
,
1010
,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}— C3 = {00, 01, 100,
1010
,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}— C3 = {00, 01, 100, 1010,
1011
}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}— C3 = {00, 01, 100, 1010, 1011}
L4 = {1, 3, 3, 3, 3, 4}
— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
Lengths and Trees
Suppose someone said “I want prefix codes with codewords lengths”:
L1 = {4, 4, 4, 4}— C1 = {0001, 0010, 0100, 1000}
L2 = {1, 2, 3, 3}— C2 = {0, 10, 110, 111}
L3 = {2, 2, 3, 4, 4}— C3 = {00, 01, 100, 1010, 1011}
L4 = {1, 3, 3, 3, 3, 4}— Impossible!
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et
19 / 25
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C, its codeword lengths {`1, . . . , `I} satisfy
I∑
i=1
2−`i ≤ 1 (1)
Conversely, if the set {`1, . . . , `I} satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and
∑4
i=1 2
−4 = 14 ≤ 1
2 C2 = {0, 10, 110, 111} is prefix and
∑4
i=1 2
−`i = 12 +
1
4 +
2
8 = 1
3 Lengths {1, 2, 2, 3} give
∑4
i=1 2
−`i = 12 +
2
4 +
1
8 > 1 so no prefix code
20 / 25
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C, its codeword lengths {`1, . . . , `I} satisfy
I∑
i=1
2−`i ≤ 1 (1)
Conversely, if the set {`1, . . . , `I} satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and
∑4
i=1 2
−4 = 14 ≤ 1
2 C2 = {0, 10, 110, 111} is prefix and
∑4
i=1 2
−`i = 12 +
1
4 +
2
8 = 1
3 Lengths {1, 2, 2, 3} give
∑4
i=1 2
−`i = 12 +
2
4 +
1
8 > 1 so no prefix code
20 / 25
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C, its codeword lengths {`1, . . . , `I} satisfy
I∑
i=1
2−`i ≤ 1 (1)
Conversely, if the set {`1, . . . , `I} satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and
∑4
i=1 2
−4 = 14 ≤ 1
2 C2 = {0, 10, 110, 111} is prefix and
∑4
i=1 2
−`i = 12 +
1
4 +
2
8 = 1
3 Lengths {1, 2, 2, 3} give
∑4
i=1 2
−`i = 12 +
2
4 +
1
8 > 1 so no prefix code
20 / 25
The Kraft Inequality
a.k.a. The Kraft-McMillan Inequality
Kraft Inequality
For any prefix (binary) code C, its codeword lengths {`1, . . . , `I} satisfy
I∑
i=1
2−`i ≤ 1 (1)
Conversely, if the set {`1, . . . , `I} satisfy (1) then there exists a prefix code
C with those codeword lengths.
Examples:
1 C1 = {0001, 0010, 0100, 1000} is prefix and
∑4
i=1 2
−4 = 14 ≤ 1
2 C2 = {0, 10, 110, 111} is prefix and
∑4
i=1 2
−`i = 12 +
1
4 +
2
8 = 1
3 Lengths {1, 2, 2, 3} give
∑4
i=1 2
−`i = 12 +
2
4 +
1
8 > 1 so no prefix code
20 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i
≤ 2`
∗
excluded `∗-bit codewords.
But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i
≤ 2`
∗
excluded `∗-bit codewords.
But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i
≤ 2`
∗
excluded `∗-bit codewords.
But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i
≤ 2`
∗
excluded `∗-bit codewords.
But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i
≤ 2`
∗
excluded `∗-bit codewords.
But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i
≤ 2`
∗
excluded `∗-bit codewords.
But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i ≤ 2`
∗
excluded `∗-bit codewords. But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
1
2`∗
I∑
i=1
2`
∗−`i ≤ 1
excluded `∗-bit codewords. But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Prefixes Exclude Codes
We are constrained when constructing prefix codes, as selecting a codeword
eliminates a whole subtree
Choosing a prefix codeword of length 1 — e.g., c(a) = 0 — excludes:
1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001
0000
111
110
101
100
011
010
001
000
11
10
01
00
0
1
T
he
to
ta
l s
ym
bo
l c
od
e
bu
dg
et 2 x 2-bit codewords: {00, 01}
4 x 3-bit codewords: {000, 001, 010, 011}
8 x 4-bit codewords: {0000, 0001, . . . , 0111}
In general, an `-bit codeword excludes
2k−` x k -bit codewords
For lengths L = {`1, . . . , `I} and `∗ = max{`1, . . . , `I}, there will be
I∑
i=1
2−`i ≤ 1
excluded `∗-bit codewords. But there are only 2`
∗
possible `∗-bit codewords
21 / 25
Kraft inequality: other direction
Suppose we are given lengths satisfying
I∑
i=1
2−`i ≤ 1
Then, we can construct a code by:
Picking the first (remaining) node at depth `1, and using it as the first
codeword
Removing all descendants of the node (to ensure the prefix condition)
Picking the next (remaining) node at depth `2, and using it as the
second codeword
Removing all descendants of the node (to ensure the prefix condition)
…
22 / 25
Kraft inequality: comments
Kraft’s inequality actually holds more generally for uniquely decodable
codes
Harder to prove
Note that if a given code has lengths that satisfy
I∑
i=1
2−`i ≤ 1
it does not mean the given code necessarily is prefix
Just that we can construct a prefix code with these lengths
23 / 25
Summary
Key ideas from this lecture:
Prefix and Uniquely Decodeable variable-length codes
Prefix codes are tree-like
Every Prefix code is Uniquely Decodeable but not vice versa
The Kraft Inequality:
I Code lengths satisfying
∑
i 2
−`i ≤ 1 implies Prefix/U.D. code exists
I Prefix/U.D. code implies
∑
i 2
−`i ≤ 1
Relevant Reading Material:
MacKay: §5.1 and §5.2
Cover & Thomas: §5.1, §5.2, and §5.5
24 / 25
Next time
Bound on expected length for a prefix code
Shannon codes
Huffman coding
25 / 25
Variable-Length Codes
Unique Decodeability
Prefix Codes
The Kraft Inequality
Summary