Announcements
Announcements
Exam 2 on Friday
Please let me know if you cannot take during class time (and didn’t already tell me for exam 1)
No class on Monday
Last Time
Greedy Algorithms
Find decision Procedure
Repeatedly make best available choice
Repeat until done
Exchange arguments
Optimal Caching
Today
Huffman Codes
Minimum Spanning Trees
Huffman Codes
Want to encode string of letters in binary.
Ex: ABCDACBDAD
A = 00, B = 01, C = 10, D = 11
A B C D A C B D A D
00 01 10 11 00 10 01 11 00 11
Use two bits to encode each letter.
Question: Encoding Length
Using the coding scheme from the last slide, how many bits are needed to encode a string of n As, Bs, Cs and Ds?
2
n
2n
4n
n2
You need two bits per letter.
Non-Fix Length Encodings
Suppose instead we had to decode:
AAABAACBAABADAAA
16 Letters requires 32 bits.
Note that there are a lot of As here. If we could find a way to encode them with fewer bits, we could save a lot.
Unique Decoding
Cannot do any encoding we like.
Suppose we tried:
A = 0, B = 1, C = 10, D = 01
How do you decode 01? Either AB or D.
Problem: The encoding for A is a prefix of the encoding for D. When you see it, you don’t know if it’s an A, or the start of a D.
Prefix Free Encodings
Definition: An encoding is prefix-free if the encoding of no letter is a prefix of the encoding of any other.
Example:
A = 0, B = 10, C = 110, D = 111
Lemma: Any prefix-free encoding can be uniquely decoded.
Example
A = 0, B = 10, C = 110, D = 111
Decode:
00010001101000100111000
A
A
A
B
A
A
C
B
A
A
B
A
D
A
A
A
Only 23 bits instead of 32!
Optimal Encoding
Problem: Given a string, S, find a prefix-free encoding that encodes S using the fewest number of bits.
How Long is the Encoding?
If for each letter x in our string, x appears f(x) times and if we encode x as a string of length ℓ(x), the total encoding length is:
Σ f(x)·ℓ(x).
Our example has:
11 As, 3 Bs, 1 C, 1 D.
These are the frequencies. We need to find the best encoding.
Tree Representation
Can represent prefix-free encoding as a tree.
A
B
C
D
0
0
0
1
1
1
Letters are leaves.
Length of encoding = Depth of leaf.
Question: Tree Decoding
What letter does the string 111 correspond to in this tree?
A
B
C
D
0
0
0
1
1
1
E
0
1
Placement of Leaves
Suppose we know the tree structure. Where do we put the letters?
Objective = Σ freq(x)depth(x)
Want least frequent letters at lowest depth.
C
F
A
B
G
D
E
Letter frequencies
Ax10, Bx15,
Cx4, Dx22,
Ex31, Fx5,
Gx19
Siblings
No matter what the tree structure, two of the deepest leaves are siblings.
Can assume filled by two least frequent elements.
Can assume that two least frequent elements are siblings!
Example
Frequencies:
Ax30, Bx15, Cx25, Dx50, Ex65
B
C
B OR C
Think of as a new node of weight 15+25=40
Example
A
30
B
15
C
25
D
50
E
65
B OR C
40
A OR (B OR C)
70
D OR E
115
(A OR (B OR C)) OR (D OR E)
185
Algorithm
HaffmanTree(L)
While(at least two left)
x, y ← Two least frequent
z new node f(z) ← f(x)+f(y)
x and y children of z
Replace x and y with z in L
Return remaining elt of L
Better with priority queue.
Optimized Algorithm
HuffmanTree(L)
Priority queue Q
Insert all elements of L to Q
While(|Q| ≥ 2)
x ← Q.DeleteMin()
y ← Q.DeleteMin()
Create z, f(z) = f(x) + f(y)
x and y children of z
Q.Insert(z)
Return Q.DeleteMin()
O(n log(n))
O(n) Iterations
O(log(n))
Runtime: O(n log(n))
Minimum Spanning Trees
Suppose that you have a collection of cities that you would like to connect by roads.
There are several potential roads you could build.
Each has a cost.
What is the cheapest way to connect them?
1
1
1
2
2
2
3
3
3
4
1+1+1+2+2=7
Trees
Note: In this problem, you will never want to build more roads than necessary. This means, you will never want to have a cycle.
Definition: A tree is a connected graph, with no cycles.
A spanning tree in a graph G, is a subset of the edges of G that connect all vertices and have no cycles.
If G has weights, a minimum spanning tree is a spanning tree whose total weight is as small as possible.
Question: MST
What is the weight of the minimum spanning tree of the graph below?
5
6
7
8
9
1
2
3
4
5
Basic Facts about Trees
Lemma: For an undirected graph G, any two of the below imply the third:
|E| = |V|-1
G is connected
G has no cycles
Corollary: If G is a tree, then |E| = |V|-1.
Proof Idea
Start with a graph with no edges.
Add edges one at a time.
Count number of connected components.
Extra Edge
An extra edge decreases the number of CCs by 1 unless it creates a cycle.
Combines
CCs
Creates
Cycle
Number of Components
Starts with |V|.
Each edge decreases by 1 unless a cycle.
Final graph is connected if it reduces to 1.
If |E|=|V|-1 and no cycle, then only 1 CC left.
If |E|=|V|-1 and connected, each edge must decrease by 1, so no cycles.
If connected and no cycles, each edge decreases by 1, so must be |V|-1 edges.
Greedy Idea
How do you make an MST?
Try using the cheapest edges.
Proposition: In a graph G, let e be an edge of lightest weight. Then there exists an MST of G containing e. Furthermore, if e is the unique lightest edge, then all MSTs contain e.
Proof Idea
Suppose that we have an MST T that does not contain e.
Modify T to get T’ that does contain e and has wt(T’) ≤ wt(T).
T’ will be a MST as well.
Furthermore if e is the unique lightest edge, wt(T’) < wt(T), so T could not have been minimal.
Proof
Consider tree T not containing e.
With extra edge, no longer a tree, must contain a cycle.
Remove edge e’ from cycle to get T’.
|T’|=|V|-1, and connected, so T’ is a tree.
wt(T’) = wt(T)+wt(e)-wt(e’)
≤ wt(T)
(because wt(e) is minimal).
e
e’