代写代考 CS4551

Multimedia Software Systems CS4551
Lossless Compression – Part I
Why Compression?
• Multimedia information has to be stored and transported efficiently.

Copyright By PowCoder代写 加微信 powcoder

• Multimedia information is bulky
– HDTV: 1920×1080×30×12 = 745 Mb/s!
• Use technologies with more bandwidth – Expensive
– Research about Hardware
• Find ways to reduce the number of bits to transmit without compromising on the “information content” => Compression
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 2

Compression Scheme
Input Encoder Decoder Output
data (compression) (decompression) data • Compression ratio = R/C
– R = the total number of bits required to represent data before compression
– C = the total number of bits required to represent data after compression
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 3
Storage or network
Types of Compression • Lossless
– Does not lose information – the original signal is perfectly reconstructed after decompression
– Produces a variable bit-rate
– Not guaranteed to actually reduce the data size
– Loses some information – the original signal is not perfectly
reconstructed after decompression
– Produces any desired constant bit-rate
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 4

Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 5
Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 6

Run-Length Encoding (RLE)
• Sequence of elements, 𝑐1, 𝑐2, ⋯, 𝑐𝑖, ⋯, is mapped to 𝑐𝑖, 𝑙𝑖 ,
where 𝑐𝑖 = symbol and 𝑙𝑖 = length of the symbol 𝑐𝑖’s run
• For example, given the sequence of symbols
{ 1, 1, 1, 3, 3, 6, 6, 6, 2, 2, 2, 2, 3, 3, 1, 4, 4 } the run-length encoding is
(1, 3), (3, 2), (6, 3), (2, 4), (3, 2), (1, 1), (4, 2).
• We can apply this run-length encoding for a bi-level image, which has two symbols, 0 and 1, by simply coding the length of each run.
• Two dimensional run-length encoding is also available.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 7
Run-Length Encoding (RLE)
• Performs best in the presence of repetitions or redundancy of symbols.
• Not practical to encode runs of length 1.
• Used as a part of compression standards such as TIFF, BMP, PCX, and JPEG.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 8

Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 9
Repetition Suppression
• Repetition Suppression
– Repetitive occurrences of a specific character are replaced by a special flag
– E.g. ab000000000→abΨ
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 10

Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 11
Variable Length Coding
• Idea for variable length coding
– “Translate” each symbol represented by a fixed number of bits into a “codeword”. Codewords may have different binary lengths.
– Example: You have 4 symbols (a, b, c, d). Each in binary may be represented using 2 bits each, but coded using a different number of bits.
• a (00)→000 • b (01)→001 • c(10)→01 • d(11)→1
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 12

Average Symbol Length
• Symbol : individual element of information
• Symbol Length 𝑙 𝑖 = binary length of ith symbol
• M = total number of symbols that the
source emits and ith symbol has been
emitted 𝑚 𝑖 times (over a certain time T): 𝑁
𝑀 = ෍ 𝑚(𝑖) 𝑖=1
• Number of bits been emitted: 𝑁
𝐿=෍𝑚𝑖 𝑙(𝑖) 𝑖=1
• Average length per symbol:
𝐿 = ෍𝑚(𝑖)𝑙(𝑖) = ෍𝑝 𝑖 𝑙(𝑖)
Consider the data “ABBACDAA”
• Symbols: A, B, C, D
• Symbol length: 2 bits per symbol
• Total number of symbols
= 4As + 2Bs + 1C + 1D
• Number of bits emitted
16 bits = 8 × 2
= 4×2+2×2+1×2+1×2
• Average length per symbol
2 = 2×(4/8) + 2×(2/8) + 2×(1/8) + 2×(1/8)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 13
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 14

Minimum Average Symbol Length
• Main goal is to minimize the average symbol length. • Basic idea for reducing the average symbol length:
– Assign Shorter Codewords to symbols that appear more frequently, Longer Codewords to symbols that appear less frequently.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 15
Shannon’s Information Theorem • Theorem:
– “The Average Binary Symbol Length of the encoded symbols is always greater than or equal to the entropy H of the source” (under the First-order Model or memory-less model)
• Memoryless source model : an information source that is independently distributed. Namely, the value of the current symbol does not depend on the values of the previously appeared symbols.
• What is the entropy of a source of symbols and how is it computed?
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 16

Symbol Probability
• Probability 𝑝 𝑖 of a symbol: number of times it can occur in the transmission (also relative frequency) and is defined as:
• From Probability Theory we know
0≤𝑝𝑖≤1&෍𝑝𝑖=1 𝑖=1
• Average symbol length is defined as
𝑁𝑚𝑖 𝑁 ෍𝑀𝑙𝑖=෍𝑝𝑖𝑙𝑖
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 17
Entropy Definition Entropy is defined as
𝐻 = ෍𝑃(𝑖)log2 𝑃(𝑖) = −෍𝑃(𝑖)log2 𝑃(𝑖) 𝑖=1 𝑖=1
log 1 is the number of bits used to send 𝑖𝑡h message 2 𝑃(𝑖)
https://www.khanacademy.org/computing/computer- science/informationtheory/moderninfotheory/v/information-entropy
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 18

Entropy Examples (1)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 19
Entropy Examples (2)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 20

Entropy Examples (3)
𝑃1 =1and𝑃2 =0 𝑃1 ≈0.5and𝑃2 ≈0.5 𝐻=0 𝐻=1
the minimum number of bits needed to represent the two symbols
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 21
• Entropy 𝐻 quantifies the amount of information (uncertainty, surprises, or disorder) contained in the message or the minimum number of bits to encode the symbols.
– Itisameasureofthedisorderofasystem–themoretheentropy,themore the disorder.
• Entropy 𝐻 depends on the probabilities of symbols.
– Given 𝑝𝑖, probability that symbol 𝑠𝑖 occurs in 𝑆, log2 1/𝑝𝑖 indicates the amount of information contained in 𝑠𝑖, which corresponds to the number of bits needed to encode 𝑠𝑖.
– 𝐻 is the weighted sum of the information carried by each symbol. Hence, it represents the average amount of information contained per symbol in the source 𝑆.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 22

Entropy • Entropy 𝐻 is always ≥ 0.
• Entropy 𝐻 is highest (equal to log2 𝑁) if all symbols are equally probable.
• Entropy 𝐻 is small when some symbols that are much more likely to appear than other symbols.
• For a memory-less source 𝑆, the entropy represents the minimum average number of bits required to represent each symbol in 𝑆. In other words,
– Itspecifiesthelowerboundfortheaveragebitstocodeeachsymbolin𝑆. – It provides an absolute limit on the shortest possible average length of
a lossless compression encoding of the data produced by a source.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 23
(VLC) Encoder Efficiency • Encoder efficiency
𝐻 𝐿𝑒𝑛𝑐𝑜𝑑𝑒𝑟
𝐿𝑒𝑛𝑐𝑜𝑑𝑒𝑟 : the average symbol length generated by the encoder for the data
– For example, given data with 𝑁 symbols, if the entropy 𝐻 of the data is 2 and the average symbol length computed by a compression method C is 2.5, then the efficiency of the encoder C is 2/2.5 = 0.8→80%.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 24

Variable Length Coding Methods
• VLC generates variable length codewords from fixed length symbol.
• VLC is one of the best known “Entropy Coding” methods.
• Methods of VLC
– Shannon-Fano Algorithm (top-down approach) – (bottom-up approach)
– Adaptive
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 25
Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 26

Shannon-Fano Algorithm
• Developed by Shannon at Bell lab and at MIT.
• Algorithm
1.Compute the frequency count of the symbols.
2.Sort the symbols according to the frequency count of their occurrences.
3.Recursively divide the symbols into two parts, each with approximately the same number of counts, until all parts contain only one symbol.
• Encoding step is top-down manner.
• A natural way of implementing the algorithm is to build a binary tree assigning 0 to its left branch and 1 to its right branch.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 27
Shannon-Fano Algorithm
• Example:Consider“HELLO”
– Thefrequencycount:H–1,E–1,L–2,O-1
• A Coding tree for HELLO by Shannon-Fano Algorithm
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 28
H 10 E 110 O 111

Shannon-Fano Algorithm • Symbol probabilities:
𝑃 𝐻 =1/5,𝑃 𝐸 =1/5,𝑃 𝐿 =2/5,𝑃 𝑂 =1/5 4
• Average length per symbol using fixed length coding
𝐻=−෍𝑃𝑖 log2𝑃𝑖 =1.92 𝑖=1
𝐿=෍2∙𝑃 𝑖 =5×2+3×5×2=2bits/symbol
• Symbol length by Shannon-Fano
𝑙𝐿 =1,𝑙𝐻 =2,𝑙𝐸 =3,𝑙𝑂 =3
• Average symbol length with Shannon-Fano
𝐿𝑆h𝑎𝑛𝑛𝑜𝑛−𝐴𝑓𝑛𝑜 =2×1+1×2+2×1×3=2bits/symbol 555
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 29
Encoder Efficiency • Encoder Efficiency
𝐻 𝐿𝑒𝑛𝑐𝑜𝑐𝑒𝑟
• Efficiency with fixed length encoding
𝐻 = 1.92 → 96% 𝐿𝑓𝑖𝑥𝑒𝑑−𝑙𝑒𝑛𝑔𝑡h−𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔 2
• Efficiency of encoder of the Shannon-Fano coding
𝐻 = 1.92 → 96% 𝐿𝑆h𝑎𝑛𝑛𝑜𝑛−𝐹𝑎𝑛𝑜 2
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 30

Shannon-Fano Algorithm
• Another coding tree for HELLO by Shannon-Fano algorithm
• Shannon-Fano delivers satisfactory coding result but it was outperformed and overtaken by the Huffman coding method.
– E.g. Consider, {A, B, C, D, E} with the frequency count 15, 7, 6, 6, and 5. Shannon- Fano algorithm needs a total of 89 bits to encode, whereas the Huffman coding needs only 87 bits.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 31
Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding)
– Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 32

• Huffman code assignment procedure is based on the frequency of occurrence of a symbol. It uses a binary tree structure and the algorithm is as follows.
• Algorithm
1. The leaves of the binary tree are associated with the list of probabilities.
2. Take two nodes with the smallest probabilities in the list and make the corresponding nodes siblings. Generate an intermediate node as their parent and label the branch from parent to one of the child nodes 0 and the branch from parent to the other child 1.
3. Replace the probabilities and associated nodes in the list by the single new intermediate node with the sum of the two probabilities. If the list contains only one element, quit. Otherwise, go to step 2.
4. Codeword formation: Follow the path from the root of the tree to the symbol, and accumulates the labels of all the branches.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 33
– Example 1 • Let’s encode “go go gopher” using Huffman coding.
• 7 symbols = {g, o, p, h, e, r, space}
– 3bitspersymbolusingthefixedlengthcodingmethod
– 𝑃 𝑔 =3/12,𝑃 𝑜 =3/12,𝑃 𝑝 =1/12,𝑃 h =1/12,𝑃 𝑒 =1/12, 𝑃 𝑟 =1/12,𝑃 𝑠𝑝𝑎𝑐𝑒 =2/12
– Average length per symbol using fixed length coding
𝐿=෍3∙𝑃 𝑖 =3bits/symbol
𝐻=−෍𝑃𝑖 log2𝑃𝑖 𝑖=1
– Computetheaveragesymbollengthandthecodingefficiency
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 34

– Example 1
• Huffman code
–e 1000 –r 1001
– space 101
Average Symbol Length = 2.66 Entropy = 2.62
Coding Efficiency = 2.62/2.66 = 0.98
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 35
– Example 2
CSULA CS451 Multimedia Software Systems by Eun-Young Kang

– Example 2
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Fixed length coding
Variable length coding
– Decoding
• To decode the bit stream, the decoder needs the Huffman table
• Decoding the bit stream: – With fixed length coding:
001010001110101001 = bcbgfb
– With Huffman coding in the previous slide:
1111011111110011 = ghed (variable-length)
– Both fixed length code and the Huffman code are uniquely
decodable.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 38

Uniquely Decodable Code
• Decoding the bit stream using the Huffman code table/tree:
– We can decode the bit stream because it is a prefix code (no codeword is the prefix of another codeword)
– Unique prefix property make decoding more efficient because it precludes any ambiguity in decoding and the decoder can immediately produce a symbol without waiting for any more bits to be transmitted.
* Both fixed length code and prefix code are uniquely decodable.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 39
Shannon-Fano vs Code
A 00 B 01 C 10
Symbol Frequency
A 24 B 12 C 10 D8 E8
Shannon-Fano
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 40

Shannon-Fano vs
The Shannon-Fano coding provides a similar result compared with Huffman coding at the best. It will never exceed Huffman coding.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 41
Symbol Frequency
A 24 B 12 C 10 D8 E8
Symbol Code
Application CCITT Group 3 1-D FAX Encoding
• CCITT(ComitéConsultatifInternationalTéléphoniqueet Télégraphique, an organization that sets international communications standards.) – now known as ITU (International Telecommunication Union)
• Group 3 and Group 4 encodings are compression algorithms that are specifically designed for encoding 1-bit image data. Many document and FAX file formats support Group 3 compression, and several, including TIFF, also support Group 4.
• Group 3 encoding was designed specifically for bi-level, black- and-white image data telecommunications. All modern FAX machines and FAX modems support Group 3 facsimile transmissions.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 42

Application CCITT Group 3 1-D FAX Encoding
• Facsimile: bilevel image, 8.5”×11” page scanned (left- to-right) at 200 dpi: 3.74 Mb = (1728 pixels on each line)
• Encoding Process
– Count each run of white/black pixels
– Encode the run-lengths with Huffman coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 43
Application CCITT Group 3 1-D FAX Encoding
• Decomposeeachrun-length𝑛as𝑛=𝑘×64+𝑚(with0≤𝑘≤ 27 and 0 ≤ 𝑚 < 64), and create Huffman tables for both 𝑘 and 𝑚. – ThestandardHuffmantableisavailableintheencodinganddecodingside. – 92differentcodes:28groupsof𝑝×64pixels,and64short-runsof0-63 pixels. – Note: Probabilities of run-lengths are not the same for black and white pixels • Special Codewords for end-of-line, end-of-page, synchronization. – EachscanlineendswithaspecialEOL(endofline)characterconsistingof eleven zeros and a 1 (000000000001). • Average compression ratio on text documents: 20-to-1 CSULA CS451 Multimedia Software Systems by Eun-Young Kang 44 Application CCITT Group 3 1-D FAX Encoding 500 white runs 800 black runs 428 white runs White: 500 = 7 × 64 + 52 => k = 7, m = 52
Black: 800 = 12 × 64 + 32 => k = 12, m = 32
White: 428 = 6 × 64 + 44 => k = 6, m = 44
The line will be encoded using the Huffman codes for ks, ms and end-of-line code.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 45
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 46

CSULA CS451 Multimedia Software Systems by Eun-Young Kang 47
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 48

Application CCITT Group 3 1-D FAX Encoding
e.g. 500 white pixel run
500 = 448+ 52 = 7 × 64 + 52 => k = 7, m = 52 0110010001010101000000000001  28bits
500 bits reduced to 28bits
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 49
and its Extension
• Huffman coding method has been adopted in many applications such as fax machines, JPEG, and MPEG.
• ExtendedHuffmanCoding
– Assignasinglecodewordtothegroupofsymbolsinsteadofassigninga codeword to each symbol.
• Adaptive
– The Huffman algorithm requires prior statistical knowledge about the information source and such information is often not available (e.g. live/streaming audio and video).
– AdaptiveHuffmancodingusestheprobabilitiesthatarenolongerbasedon the prior knowledge but on the actual data received on.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 50

Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding)
– Shannon-Fano Algorithm –
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 51
• The Huffman algorithm requires prior statistical knowledge about the information source and such information is often not available.
• Adaptive Huffman algorithm uses statistics gathered and updated based on occurrences of preceding symbols. In this approach, as the probability of the received symbols change, symbols will be given new (longer or shorter) codes.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 52

• Procedures for Adaptive
1. Initial_code();
2. while not end_of_stream
encode(c);
update_tree(c);
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
* C is a symbol.
•Initial_code(c): assigns codes for symbols with some initially agreed-upon. For example ASCII code.
•encode(c): encodes the symbol c using the current Huffman tree or the initial codes.
•update_tree(c): constructs an adaptive Huffman tree. It increments the frequency counts for the symbol c and updated the configuration of the tree.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 54

• update_tree(c):
– The Adaptive Huffman tree must maintains is sibling property: all node

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com