Multimedia Software Systems CS4551
Lossless Compression – Part II
Lossless Encoding Methods
• Run-length Encoding
Copyright By PowCoder代写 加微信 powcoder
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 2
Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 3
LZW – Pattern Substitution Encoding
• Invented by Lempel and Ziv in 1978 (LZ78) and improvements given by Welch in 1984.
• Encoding Process:
1. Initialize the dictionary to contain all single/initial symbols.
• The vocabulary (collection of symbols) forms the initial dictionary entries.
• Every entry in the dictionary is given an index.
2. While scanning the message to compress, search for the longest sequence of symbols that has appeared as an entry in the dictionary. Call this entry E.
3. Encode E in the message by its index in the dictionary.
4. Add a new entry to the dictionary, which is E followed by the next symbol in the scan.
5. Repeat the process (step 2 ~ step 4) until you reach the end of the message.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 4
Pattern Substitution – Example 1
Data: a b b a a b b a a b a b b a a a a b a a b b a 0110242 6 55 7 30
Number of bits to represent indices of dictionary with 14 entries = 4 bits
Number of indices to encode the input = 13
Data Size after Compression = 13 × 4 bits = 52 bits (Ignore the dictionary size with single-length entries)
Dictionary
0 1 2 3 4 5 6
b ab bb ba aa abb
7 8 9 10 11 12 13
baa aba abba aaa aab baab bba
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 5
Pattern Substitution – Example 2
xxyyxyxyxxyyyxyxxyxxyyx
001136347580
Dictionary
0 1 2 3 4 5 6
y xx xy yy yx xyx
7 8 9 10 11 12
xyxx xyy yyx xyxxy yxx xyyx
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 6
Pattern Substitution – Example 3
• Data:abababab
• Encode the data using LZW. Show the dictionary.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 7
LZW – Pattern Substitution Decoding
• Decoding Process (with the exception handling)
1. Start with the dictionary containing blocks of length one
2. while not (the end of encoded data)
• 𝐾the next index
• Output Entry 𝐸 at the index 𝐾 from the dictionary
• Update the Dictionary
o If the entry 𝐸′ at the next index exists from the dictionary, then add to the dictionary 𝐸 + 𝐹𝑖𝑟𝑠𝑡_𝑆𝑦𝑚𝑏𝑜𝑙 𝐸′ .
o Otherwise, add 𝐸 + 𝐹𝑖𝑟𝑠𝑡_𝑆𝑦𝑚𝑏𝑜𝑙 𝐸 to the dictionary.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
LZW – Pattern Substitution Decoding
• Handling exceptions (cases when an entry for the index does not exist in the dictionary for decoding)
Otherwise, add E + First_Symbol(E) to the dictionary.
• Analysis:
– Anexceptionalcaseisproducedintheencodingsidewhentheentry E+Next_Sysmbol that has just entered into the dictionary was used immediately for encoding the next entry E’.
– ThenextentryE’mustbetheformofE+First_Symbol(E).That means E+Next_Sysmbol is same as E+First_Symbol(E).
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 9
Pattern Substitution – Example 1
• Given the compressed data below, decode it using LZW. Show the dictionary.
0110242655730
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 10
Pattern Substitution – Example 2
• Given the compressed data below, decode it using LZW. Show the dictionary.
001136347580
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 11
Pattern Substitution – Example 3
• Consider the previous encoding example with abababab. Decode the data using LZW. Show the dictionary.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 12
LZW – Pattern Substitution
• Variable length pattern to fixed length index
• Size of the dictionary
– Theoretically, it can grow infinitely
– Practically, recommended size is 4096 (12 bits). Once the limit is reached, no more entries are added.
• Works well when the input data is sufficiently large and there are sufficient redundancy in data.
• E.g. – UNIX compress utility, gzip, gunzip, Windows WinZip, GIF and TIFF (optional) compression
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 13
Lossless Encoding Methods
• Run-length Encoding
• Repetition Suppression
• Variable Length Coding (Entropy Coding) – Shannon-Fano Algorithm
– Adaptive
• Pattern Substitution : A pattern, which occurs frequently, is replaced by a specific symbol
– Dictionary based Coding LZW
• Arithmetic Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 14
Arithmetic Coding
• Arithmetic coding is a more modern coding method that usually outperforms Huffman coding in practice.
• Initial idea was introduced by Shannon in 1948.
• It was fully developed in the late 1970s and 1980s.
• Arithmetic coding treats the whole message as one unit. A message is represented by a half-open interval [a,b) where a and b are real numbers between 0 and 1 and encoded as a number that falls with the range [a,b).
• In practice, the input data is usually broken up into chunks. In this class, we take a short and simple message and encode it.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 15
Arithmetic Coding
• Suppose that we have 7 symbols {A,B,C,D,E,F,$} and their probabilities are 0.2, 0.1, 0.2, 0.05, 0.3, 0.05, 0.1.
Probability
Symbol Range
A B C D E F $
0.2 0.1 0.2 0.05 0.3 0.05 0.1
[0, 0.2) [0.2, 0.3) [0.3,0.5) [0.5,0.55) [0.55,0.85) [0.85,0.9) [0.9,1)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 16
Arithmetic Coding
• Suppose that we have a text string “CAEE$” to encode. Arithmetic coding finds
a unique range [a,b) to represent this string.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 17
Arithmetic Coding
• Suppose that we have a text string “CAEE$” to encode. Arithmetic coding finds a unique range [a,b) to represent this string.
• Then, generate a number (a binary fraction) that falls in within a range.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 18
Arithmetic Coding
• Algorithm: Arithmetic Coding Encoder – BEGIN
• low = 0.0; high = 1.0; range = 1.0 • while (not end of the message)
– get (symbol);
– Update low and high by
low = tmp + range * range_low(symbol); high = tmp + range * range_high(symbol);
– range = high-low;
• output a binary code that falls within [low,
high) –> algorithm in the next slide – END
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 19
Binary Fractions
https://www.electronics- tutorials.ws/binary/binary-fractions.html
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 20
Arithmetic Coding
• Algorithm: Generating Codeword for Encoder – BEGIN
• code = 0;
• while (value(code) < low)
– assign 1 to the k-th binary fraction bit – if (value(code) > high)
» replace the k-th bit by 0 –k = k+1;
• For CAEE$, we have [0.33184, 0.3322). Using the algorithm, we acquire a binary fraction number 0.01010101, which is 0.33203125 in decimal fraction. Therefore, to represent the sequence of symbol “CAEE$”, we need 8 bits.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 21
Arithmetic Coding
• Algorithm: Arithmetic Coding Decoder – BEGIN
• value <- Get binary code and covert to decimal value
– Find a symbol s so that value is within
[Range_low(s), Range_high(s))
– Output s
– low = Range_low(s)
– high = Range_high(s)
– range = high - low
– value = [value – low]/range; // rescaling the value
• Until end of the message – END
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 22
Arithmetic Coding • Decoding 0.01010101 (=0.33203125).
Output symbol
0.33203125
0.16015625
0.80078125
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 23
Decoding Steps
0.5 (high)
0.33203125
Rescale the value
0.16015625
0.33203125
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 24
Arithmetic Coding
• The presented Arithmetic coding algorithm requires to use very high-precision numbers to do encoding when the interval shrinks. It is possible to rescale the intervals and use only integer arithmetic for a practical implementation.
• Arithmetic coding is best accomplished using standard 16-bit or 32-bit integer math.
• Known to be better than Huffman coding.
– E.g. Let’s assume we have symbols {A,B,C} and their probabilities 𝑃(𝐴)=0.5, 𝑃(𝐵)=0.4, 𝑃(𝐶)=0.1 (assign half-open intervals in this order for the Arithmetic coding). For “BBB”, Huffman coding will require 6 bits, whereas Arithmetic coding will require only 4 bits.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 25
Lossless Encoding Methods
• Lossless Image Compression – Differential Coding of Image
Differential Operation
𝑑𝑥,𝑦 =𝐼𝑥,𝑦 −𝑃𝑥,𝑦
where 𝐼 𝑥, 𝑦 is the current pixel at 𝑥, 𝑦
and 𝑃 𝑥, 𝑦 is the prediction for the pixel at CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Lossless Image Compression –
Differential Coding of Image
• Differential coding of image
– Use the idea that neighboring image pixels are similar.
• Given the original image, 𝐼 𝑥, 𝑦 means a pixel value in 𝑥, 𝑦 location and we can define a difference image using the following differential operation
𝑑 𝑥,𝑦 =𝐼 𝑥,𝑦 −𝐼 𝑥−1,𝑦 ,whichisasimpleapproximationofapartial differential operator 𝜕/𝜕𝑥 applied to an image.
• An example
The original image Differential image
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Lossless Image Compression –
Differential Coding of Image
• Usually the entropy of the original image of the original image is larger than the entropy of the differential image. So we can apply VLC (such as Huffman coding) to the differential image and achieve better compression ratio.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 28
Lossless Image Compression – Differential Coding of Image
• Fordecoding,use𝐼 𝑥,𝑦 =𝐼 𝑥−1,𝑦 +𝑑 𝑥,𝑦
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 29
Lossless Image Compression –
Prediction Based
• Predictors that use only 1 pixels are called first-order. Differential Coding method presented in the previous slides use a first-order predictor.
• Predictors that use only 2, 3 or more pixels are called second-order, third order or high-order. This prediction based image compression is used for JPEG lossless mode compression.
• Let’s assume that we want to predict X pixel and compute the difference. There are many ways to predict.
𝑃 = 1/2 𝐴 + 1/2 𝐶
or 𝑃 = 𝐴 − 𝐵 + 𝐶
Then𝑑 𝑥,𝑦 =𝑋−𝑃 𝑥,𝑦 =𝐼 𝑥,𝑦 −𝑃
e.g.𝑑 𝑥,𝑦 =𝐼 𝑥,𝑦 −1/2 𝐼 𝑥−1,𝑦 +𝐼 𝑥,𝑦−1
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 30
Lossless Image Compression – Prediction Based
Entropy = 7.4451
Entropy = 7.1914
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Lossless Image Compression – Prediction Based
Entropy = 5.0461
Entropy = 5.5751
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Lossless Image Compression – Prediction Based
Entropy = 4.6063
Entropy = 5.2033
CSULA CS451 Multimedia Software Systems by Eun-Young Kang 33
Lossless Image Compression – Prediction Based
Entropy = 4.7965
Entropy = 5.4177
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Lossless Image Compression – Prediction Based
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
min𝐴,𝐶 ,if𝐵≥max(𝐴,𝐶) 𝑋=ቐmax 𝐴,𝐶 ,if𝐵≤min(𝐴,𝐶)
𝐴 − 𝐵 + 𝐶, otherwise
Entropy = 4.5418
min𝐴,𝐶 ,if𝐵≥max(𝐴,𝐶) 𝑋=ቐmax 𝐴,𝐶 ,if𝐵≤min(𝐴,𝐶)
𝐴 − 𝐵 + 𝐶, otherwise
Entropy = 5.0986
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com