CS代考 CS4551

Multimedia Software Systems CS4551
Lossless Compression – Part II
Lossless Encoding Methods
• Run-length Encoding

Copyright By PowCoder代写 加微信 powcoder

• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 2

Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 3
LZW – Pattern Substitution Encoding
• Invented by Lempel and Ziv in 1978 (LZ78) and improvements given by Welch in 1984.
• Encoding Process:
1. Initialize the dictionary to contain all single/initial symbols.
• The vocabulary (collection of symbols) forms the initial dictionary entries.
• Every entry in the dictionary is given an index.
2. While scanning the message to compress, search for the longest sequence of symbols that has appeared as an entry in the dictionary. Call this entry E.
3. Encode E in the message by its index in the dictionary.
4. Add a new entry to the dictionary, which is E followed by the next symbol in the scan.
5. Repeat the process (step 2 ~ step 4) until you reach the end of the message.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 4

Pattern Substitution – Example 1
Data: a b b a a b b a a b a b b a a a a b a a b b a 0110242 6 55 7 30
Number of bits to represent indices of dictionary with 14 entries = 4 bits
Number of indices to encode the input = 13
Data Size after Compression = 13 × 4 bits = 52 bits (Ignore the dictionary size with single-length entries)
Dictionary
0 1 2 3 4 5 6
b ab bb ba aa abb
7 8 9 10 11 12 13
baa aba abba aaa aab baab bba
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 5
Pattern Substitution – Example 2
xxyyxyxyxxyyyxyxxyxxyyx
001136347580
Dictionary
0 1 2 3 4 5 6
y xx xy yy yx xyx
7 8 9 10 11 12
xyxx xyy yyx xyxxy yxx xyyx
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 6

Pattern Substitution – Example 3
• Data:abababab
• Encode the data using LZW. Show the dictionary.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 7
LZW – Pattern Substitution Decoding
• Decoding Process (with the exception handling)
1. Start with the dictionary containing blocks of length one
2. while not (the end of encoded data)
• 𝐾the next index
• Output Entry 𝐸 at the index 𝐾 from the dictionary
• Update the Dictionary
o If the entry 𝐸′ at the next index exists from the dictionary, then add to the dictionary 𝐸 + 𝐹𝑖𝑟𝑠𝑡_𝑆𝑦𝑚𝑏𝑜𝑙 𝐸′ .
o Otherwise, add 𝐸 + 𝐹𝑖𝑟𝑠𝑡_𝑆𝑦𝑚𝑏𝑜𝑙 𝐸 to the dictionary.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang

LZW – Pattern Substitution Decoding
• Handling exceptions (cases when an entry for the index does not exist in the dictionary for decoding)
Otherwise, add E + First_Symbol(E) to the dictionary.
• Analysis:
– Anexceptionalcaseisproducedintheencodingsidewhentheentry E+Next_Sysmbol that has just entered into the dictionary was used immediately for encoding the next entry E’.
– ThenextentryE’mustbetheformofE+First_Symbol(E).That means E+Next_Sysmbol is same as E+First_Symbol(E).
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 9
Pattern Substitution – Example 1
• Given the compressed data below, decode it using LZW. Show the dictionary.
0110242655730
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 10

Pattern Substitution – Example 2
• Given the compressed data below, decode it using LZW. Show the dictionary.
001136347580
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 11
Pattern Substitution – Example 3
• Consider the previous encoding example with abababab. Decode the data using LZW. Show the dictionary.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 12

LZW – Pattern Substitution
• Variable length pattern to fixed length index
• Size of the dictionary
– Theoretically, it can grow infinitely
– Practically, recommended size is 4096 (12 bits). Once the limit is reached, no more entries are added.
• Works well when the input data is sufficiently large and there are sufficient redundancy in data.
• E.g. – UNIX compress utility, gzip, gunzip, Windows WinZip, GIF and TIFF (optional) compression
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 13
Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 14

Arithmetic Coding
• Arithmetic coding is a more modern coding method that usually outperforms Huffman coding in practice.
• Initial idea was introduced by Shannon in 1948.
• It was fully developed in the late 1970s and 1980s.
• Arithmetic coding treats the whole message as one unit. A message is represented by a half-open interval [a,b) where a and b are real numbers between 0 and 1 and encoded as a number that falls with the range [a,b).
• In practice, the input data is usually broken up into chunks. In this class, we take a short and simple message and encode it.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 15
Arithmetic Coding
• Suppose that we have 7 symbols {A,B,C,D,E,F,$} and their probabilities are 0.2, 0.1, 0.2, 0.05, 0.3, 0.05, 0.1.
Probability
Symbol Range
A B C D E F $
0.2 0.1 0.2 0.05 0.3 0.05 0.1
[0, 0.2) [0.2, 0.3) [0.3,0.5) [0.5,0.55) [0.55,0.85) [0.85,0.9) [0.9,1)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 16

Arithmetic Coding
• Suppose that we have a text string “CAEE$” to encode. Arithmetic coding finds
a unique range [a,b) to represent this string.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 17
Arithmetic Coding
• Suppose that we have a text string “CAEE$” to encode. Arithmetic coding finds a unique range [a,b) to represent this string.
• Then, generate a number (a binary fraction) that falls in within a range.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 18

Arithmetic Coding
• Algorithm: Arithmetic Coding Encoder – BEGIN
• low = 0.0; high = 1.0; range = 1.0 • while (not end of the message)
– get (symbol);
– Update low and high by
low = tmp + range * range_low(symbol); high = tmp + range * range_high(symbol);
– range = high-low;
• output a binary code that falls within [low,
high) –> algorithm in the next slide – END
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 19
Binary Fractions
https://www.electronics- tutorials.ws/binary/binary-fractions.html
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 20

Arithmetic Coding
• Algorithm: Generating Codeword for Encoder – BEGIN
• code = 0;
• while (value(code) < low) – assign 1 to the k-th binary fraction bit – if (value(code) > high)
» replace the k-th bit by 0 –k = k+1;
• For CAEE$, we have [0.33184, 0.3322). Using the algorithm, we acquire a binary fraction number 0.01010101, which is 0.33203125 in decimal fraction. Therefore, to represent the sequence of symbol “CAEE$”, we need 8 bits.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 21
Arithmetic Coding
• Algorithm: Arithmetic Coding Decoder – BEGIN
• value <- Get binary code and covert to decimal value – Find a symbol s so that value is within [Range_low(s), Range_high(s)) – Output s – low = Range_low(s) – high = Range_high(s) – range = high - low – value = [value – low]/range; // rescaling the value • Until end of the message – END CSULA CS451 Multimedia Software Systems by Eun-Young Kang 22 Arithmetic Coding • Decoding 0.01010101 (=0.33203125). Output symbol 0.33203125 0.16015625 0.80078125 CSULA CS451 Multimedia Software Systems by Eun-Young Kang 23 Decoding Steps 0.5 (high) 0.33203125 Rescale the value 0.16015625 0.33203125 CSULA CS451 Multimedia Software Systems by Eun-Young Kang 24 Arithmetic Coding • The presented Arithmetic coding algorithm requires to use very high-precision numbers to do encoding when the interval shrinks. It is possible to rescale the intervals and use only integer arithmetic for a practical implementation. • Arithmetic coding is best accomplished using standard 16-bit or 32-bit integer math. • Known to be better than Huffman coding. – E.g. Let’s assume we have symbols {A,B,C} and their probabilities 𝑃(𝐴)=0.5, 𝑃(𝐵)=0.4, 𝑃(𝐶)=0.1 (assign half-open intervals in this order for the Arithmetic coding). For “BBB”, Huffman coding will require 6 bits, whereas Arithmetic coding will require only 4 bits. CSULA CS451 Multimedia Software Systems by Eun-Young Kang 25 Lossless Encoding Methods • Lossless Image Compression – Differential Coding of Image Differential Operation 𝑑𝑥,𝑦 =𝐼𝑥,𝑦 −𝑃𝑥,𝑦 where 𝐼 𝑥, 𝑦 is the current pixel at 𝑥, 𝑦 and 𝑃 𝑥, 𝑦 is the prediction for the pixel at CSULA CS451 Multimedia Software Systems by Eun-Young Kang Lossless Image Compression – Differential Coding of Image • Differential coding of image – Use the idea that neighboring image pixels are similar. • Given the original image, 𝐼 𝑥, 𝑦 means a pixel value in 𝑥, 𝑦 location and we can define a difference image using the following differential operation 𝑑 𝑥,𝑦 =𝐼 𝑥,𝑦 −𝐼 𝑥−1,𝑦 ,whichisasimpleapproximationofapartial differential operator 𝜕/𝜕𝑥 applied to an image. • An example The original image Differential image CSULA CS451 Multimedia Software Systems by Eun-Young Kang Lossless Image Compression – Differential Coding of Image • Usually the entropy of the original image of the original image is larger than the entropy of the differential image. So we can apply VLC (such as Huffman coding) to the differential image and achieve better compression ratio. CSULA CS451 Multimedia Software Systems by Eun-Young Kang 28 Lossless Image Compression – Differential Coding of Image • Fordecoding,use𝐼 𝑥,𝑦 =𝐼 𝑥−1,𝑦 +𝑑 𝑥,𝑦 CSULA CS451 Multimedia Software Systems by Eun-Young Kang 29 Lossless Image Compression – Prediction Based • Predictors that use only 1 pixels are called first-order. Differential Coding method presented in the previous slides use a first-order predictor. • Predictors that use only 2, 3 or more pixels are called second-order, third order or high-order. This prediction based image compression is used for JPEG lossless mode compression. • Let’s assume that we want to predict X pixel and compute the difference. There are many ways to predict. 𝑃 = 1/2 𝐴 + 1/2 𝐶 or 𝑃 = 𝐴 − 𝐵 + 𝐶 Then𝑑 𝑥,𝑦 =𝑋−𝑃 𝑥,𝑦 =𝐼 𝑥,𝑦 −𝑃 e.g.𝑑 𝑥,𝑦 =𝐼 𝑥,𝑦 −1/2 𝐼 𝑥−1,𝑦 +𝐼 𝑥,𝑦−1 CSULA CS451 Multimedia Software Systems by Eun-Young Kang 30 Lossless Image Compression – Prediction Based Entropy = 7.4451 Entropy = 7.1914 CSULA CS451 Multimedia Software Systems by Eun-Young Kang Lossless Image Compression – Prediction Based Entropy = 5.0461 Entropy = 5.5751 CSULA CS451 Multimedia Software Systems by Eun-Young Kang Lossless Image Compression – Prediction Based Entropy = 4.6063 Entropy = 5.2033 CSULA CS451 Multimedia Software Systems by Eun-Young Kang 33 Lossless Image Compression – Prediction Based Entropy = 4.7965 Entropy = 5.4177 CSULA CS451 Multimedia Software Systems by Eun-Young Kang Lossless Image Compression – Prediction Based CSULA CS451 Multimedia Software Systems by Eun-Young Kang min𝐴,𝐶 ,if𝐵≥max(𝐴,𝐶) 𝑋=ቐmax 𝐴,𝐶 ,if𝐵≤min(𝐴,𝐶) 𝐴 − 𝐵 + 𝐶, otherwise Entropy = 4.5418 min𝐴,𝐶 ,if𝐵≥max(𝐴,𝐶) 𝑋=ቐmax 𝐴,𝐶 ,if𝐵≤min(𝐴,𝐶) 𝐴 − 𝐵 + 𝐶, otherwise Entropy = 5.0986 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com