程序代写代做代考 algorithm C COMP90038 – Algorithms and Complexity Lecture 21

COMP90038 – Algorithms and Complexity Lecture 21
COMP90038 Algorithms and Complexity
Lecture 21: Huffman Encoding for Data Compression (with thanks to Harald Søndergaard & Michael Kirley)
Casey Myers Casey.Myers@unimelb.edu.au David Caro Building (Physics) 274

COMP90038 – Algorithms and Complexity
Lecture 21
Review from Lecture 20: Greedy Algorithms
• In general we cannot expect locally best choices to yield globally best outcomes.
• However, there are some well-known algorithms that rely on the greedy approach,
being both correct and fast.
• In other cases, for hard problems, a greedy algorithm can sometimes serve as an acceptable approximation algorithm (we will discuss approximation algorithms in Week 12).
• Here we shall look at
– Prim’s algorithm for finding minimum spanning trees – Dijkstra’s algorithm for single-source shortest paths.

COMP90038 – Algorithms and Complexity
Lecture 21
Review from Lecture 20: Greedy Algorithms
• Prim’s algorithm for finding minimum spanning trees
• Dijkstra’s algorithm for single-source shortest paths.
b
1
2
2 a
Prim’s Algorithm
2b 2b
a1 a1
2c 2c
c
Dijkstra’s Algorithm
Finding the minimum spanning tree
Finding the single-source shortest paths

COMP90038 – Algorithms and Complexity
Lecture 21
Review from Lecture 20: Prim’s Algorithm
• On the first loop, we only create the table.
Tree 𝑇
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑐𝑜𝑠𝑡






𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙

COMP90038 – Algorithms and Complexity
Lecture 21
Review from Lecture 20: Prim’s Algorithm
• The resulting tree is 𝑎,𝑑,𝑏,𝑐,𝑓,𝑒 .
Tree 𝑇
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑐𝑜𝑠𝑡






𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑐𝑜𝑠𝑡
0





𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑎
𝑐𝑜𝑠𝑡
5
6
4


𝑝𝑟𝑒𝑣
𝑎
𝑎
𝑎
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑎,𝑑
𝑐𝑜𝑠𝑡
2
2

4
𝑝𝑟𝑒𝑣
𝑑
𝑑
𝑛𝑖𝑙
𝑑
𝑎,𝑑,𝑏
𝑐𝑜𝑠𝑡
1

4
𝑝𝑟𝑒𝑣
𝑏
𝑛𝑖𝑙
𝑑
𝑎,𝑑,𝑏,𝑐
𝑐𝑜𝑠𝑡
5
3
𝑝𝑟𝑒𝑣
𝑐
𝑐
𝑎,𝑑,𝑏,𝑐,𝑓
𝑐𝑜𝑠𝑡
4
𝑝𝑟𝑒𝑣
𝑓
𝑎,𝑑,𝑏,𝑐,𝑓,𝑒
𝑐𝑜𝑠𝑡
𝑝𝑟𝑒𝑣

COMP90038 – Algorithms and Complexity
Lecture 21
Review from Lecture 20: Dijkstra’s Algorithm
• On the first loop, we only create the table.
Tree 𝑇
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑐𝑜𝑠𝑡






𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙

COMP90038 – Algorithms and Complexity
Lecture 21
Review from Lecture 20: Dijkstra’s Algorithm
• The complete tree is 𝑎,𝑑,𝑐,𝑏,𝑒,𝑓 .
Tree 𝑇
𝑎
𝑏
𝑐
𝑑
𝑒
𝑓
𝑐𝑜𝑠𝑡






𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑐𝑜𝑠𝑡
0





𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑎
𝑐𝑜𝑠𝑡

4
1


𝑝𝑟𝑒𝑣
𝑛𝑖𝑙
𝑎
𝑎
𝑛𝑖𝑙
𝑛𝑖𝑙
𝑎,𝑑
𝑐𝑜𝑠𝑡
3
2
5
6
𝑝𝑟𝑒𝑣
𝑑
𝑑
𝑑
𝑑
𝑎,𝑑,𝑐
𝑐𝑜𝑠𝑡
3
4
5
𝑝𝑟𝑒𝑣
d
𝑐
𝑐
𝑎,𝑑,𝑐,𝑏
𝑐𝑜𝑠𝑡
4
5
𝑝𝑟𝑒𝑣
𝑐
𝑐
𝑎,𝑑,𝑐,𝑏,𝑒
𝑐𝑜𝑠𝑡
5
𝑝𝑟𝑒𝑣
𝑐
𝑎,𝑑,𝑐,𝑏,𝑒,𝑓
𝑐𝑜𝑠𝑡
𝑝𝑟𝑒𝑣
1

COMP90038 – Algorithms and Complexity Lecture 21
Data Compression
• From an information-theoretic point of view, most computer files contain a lot of redundancy.
• Compression is used to store files in less space.
• For text files, savings up to 50% are common.
• For binary files, savings up to 90% are common.
• Savings in space mean savings in time for files transmission.

COMP90038 – Algorithms and Complexity Lecture 21
Run-Length Encoding
• For a text that has long runs of repeated characters, we could compress by counting the runs:
𝐴𝐴𝐴𝐴𝐵𝐵𝐵𝐴𝐴𝐵𝐵𝐵𝐵𝐵𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐷𝐴𝐵𝐶𝐵𝐴𝐴𝐴𝐵𝐵𝐵𝐵𝐶𝐶𝐶𝐷
can then be encoded as
4𝐴3𝐵𝐴𝐴5𝐵8𝐶𝐷𝐴𝐵𝐶𝐵3𝐴4𝐵3𝐶𝐷
• For English text this is not very useful.
• For binary files it can be very good.

COMP90038 – Algorithms and Complexity Lecture 21
Run-Length Encoding

COMP90038 – Algorithms and Complexity
Lecture 21
Variable-Length Encoding
• Instead of using, say, 8 bits for characters (as ASCII does), assign character codes in such a way that common characters have shorter codes.
• For example, 𝐸 could have code 0.
• But then no other character code can start with 0.
• In general, no character’s code should be a prefix of some other character’s code (unless we somehow put separators between characters, which would take up space).

COMP90038 – Algorithms and Complexity
Lecture 21
Variable-Length Encoding
• Suppose we count symbols and find these numbers of occurrences:
• Here are some sensible codes that we may use for symbols:

COMP90038 – Algorithms and Complexity
Lecture 21
Tries for Variable-Length Encoding
• A trie is a binary tree for search applications.
• To search for a key we look at individual bits of a key and descend to the left
whenever a bit is 0, to the right whenever it is 1.
• Using a trie to determine codes means that no code will be the prefix of another!

COMP90038 – Algorithms and Complexity
Lecture 21
Variable-Length Encoding
• With the codes given by the trie, we can represent 𝐹𝐴𝐶𝐸
in just 10 bits:
010 11 011 10
𝐹𝐴𝐶𝐸
• If we were to assign 3 bits per character, would take 12 bits.

COMP90038 – Algorithms and Complexity Lecture 21
Encoding a Message
• We shall shortly see how to design the codes for symbols, taking symbol frequencies into account.
• Once we have a table of codes, encoding is straightforward.
• For example, to encode ‘𝐵𝐴𝐺𝐺𝐸𝐷’ simple concatenate the codes for 𝐵, 𝐴, G, G, E and D:
000011001001100001

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Encoding: Choosing the Codes
• Sometimes (for example for common English text) we may know the frequencies of letters fairly well.
• If we don’t know about frequencies then we can still count all characters in the given text as a first step.
• But how do we assign codes to the characters once we know their frequencies?
• By repeatedly selecting the two smallest weights and fusing them.
• This is Huffman’s algorithm—another example of a greedy method.
• The resulting tree is a Huffman tree.

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
4 5 10 12 14 27 28 𝐵𝐷𝐺𝐹𝐶𝐸𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
4
9
5 10 12 14 27 28 𝐵𝐷𝐺𝐹𝐶𝐸𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
19
9
5 10 12 14 27 28
4 𝐵𝐷𝐺𝐹𝐶𝐸𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
19 9
26
4 𝐵𝐷𝐺𝐹𝐶𝐸𝐴
5 10 12 14 27 28

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
45
19
9
4 𝐵𝐷𝐺𝐹𝐶𝐸𝐴
26
5 10 12 14 27 28

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
19 9
26 55
45
4 𝐵𝐷𝐺𝐹𝐶𝐸𝐴
5 10 12 14 27 28

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
45
19 9
26 55
4 𝐵𝐷𝐺𝐹𝐶𝐸𝐴
5 10 12 14 27 28

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
• We end up with the trie from before!
• One can show that this gives the best encoding.
19 0
9
1
0 45
0
1
1
1
01
55 01
26
0
4 𝐵𝐷𝐺𝐹𝐶𝐸𝐴
5
10 12 14 27 28

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.1 0.15 0.2 0.2 0.35 𝐵_𝐶𝐷𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.25
0.1 0.15 0.2 0.2 0.35 𝐵_𝐶𝐷𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.25
0.2 0.2 0.1 0.15 0.35 C𝐷𝐵_𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.4 0.25
0.2 0.2 0.1 0.15 0.35 C𝐷𝐵_𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.25 0.4
0.1 0.15 0.35 0.2 0.2 𝐵_𝐴𝐶𝐷

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.6
0.25 0.4
0.1 0.15 0.35 0.2 0.2 𝐵_𝐴𝐶𝐷

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.6
0.4 0.25
0.2 0.2 0.1 0.15 0.35 𝐶𝐷𝐵_𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
0.6 0.4 0.25
0.2 0.2 0.1 0.15 0.35 𝐶𝐷𝐵_𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
1
0
0.6
0
1 1
0.2 0.2 0.1 0.15 0.35 𝐶𝐷𝐵_𝐴
0.4 0.25 010

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.35
0.1
0.2
0.2
0.15
Codeword
11
100
00
01
101
1
0
0.6
0
1 1
0.2 0.2 0.1 0.15 0.35 𝐶𝐷𝐵_𝐴
0.4 0.25 010

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.1 0.15 0.15 0.2 0.4 𝐵𝐷_𝐶𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.25
0.1 0.15 0.15 0.2 0.4 𝐵𝐷_𝐶𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.25
0.15 0.2 0.1 0.15 0.4 _𝐶𝐵𝐷𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.35 0.25
0.15 0.2 0.1 0.15 0.4 _𝐶𝐵𝐷𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.25 0.35
0.1 0.15 0.15 0.2 0.4 𝐵𝐷_𝐶𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.6
0.25
0.35
0.1 0.15 0.15 0.2 0.4 𝐵𝐷_𝐶𝐴

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.6
0.25
0.35
0.4 0.1 0.15 0.15 0.2 𝐴𝐵𝐷_𝐶

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
Decode 100010111001010
0.6
0.25
0.35
0.4 0.1 0.15 0.15 0.2 𝐴𝐵𝐷_𝐶

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
1 0.6
1
Decode 100010111001010
0
0
0.25 010
0.35
1
0.4 0.1 0.15 0.15 0.2 𝐴𝐵𝐷_𝐶

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Codeword
0
100
111
101
110
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷
1 0.6
1
Decode 100010111001010
0
0
0.25 010
0.35
1
0.4 0.1 0.15 0.15 0.2 𝐴𝐵𝐷_𝐶

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Codeword
0
100
111
101
110
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷: 0100011101000101
0
1 0.6
1
Decode 100010111001010
0
0.25 010
0.35
1
0.4 0.1 0.15 0.15 0.2 𝐴𝐵𝐷_𝐶

COMP90038 – Algorithms and Complexity
Lecture 21
Huffman Trees: Example
Symbol
𝐴
𝐵
𝐶
𝐷
_
Frequency
0.4
0.1
0.2
0.15
0.15
Codeword
0
100
111
101
110
Encode 𝐴𝐵𝐴𝐶𝐴𝐵𝐴𝐷: 0100011101000101
0
1 0.6
1
Decode 100010111001010:
100
0
101
110
0
101
0
𝐵
𝐴
𝐷
_
𝐴
𝐷
𝐴
0
0.25 010
0.35
1
0.4 0.1 0.15 0.15 0.2 𝐴𝐵𝐷_𝐶

COMP90038 – Algorithms and Complexity
Lecture 21
Compressed Transmission
• If the compressed file is being sent from one party to another, the parties must agree about the codes used.
• Alternatively, the trie can be sent along with the message.
• For long files this extra cost is negligible.
• Modern variant of Huffman encoding, like Lempel-Ziv compression, assign codes not to individual symbols, but to sequences of symbols.

COMP90038 – Algorithms and Complexity Lecture 21
Coming Up Next
• NP-completeness.