INTRODUCTION
Dr. – CS576 Lecture 3 9/20/2021 Page 1
IINNFFOORRMMAATTIIOONN TTHHEEOORRYY &&
GGEENNEERRIICC CCOOMMPPRREESSSSIIOONN
TTEECCHHNNIIQQUUEESS
Dr. – CS576 Lecture 3 9/20/2021 Page 2
TTOOPPIICCSS TTOO BBEE CCOOVVEERREEDD
Need for Compression
Data Sources and Information Representation
Overview of Lossless Compression Techniques
Overview of Lossy Compression Techniques
Dr. – CS576 Lecture 3 9/20/2021 Page 3
NNEEEEDD FFOORR CCOOMMPPRREESSSSIIOONN
Multimedia information has to be stored and
transported efficiently.
Multimedia information is bulky –
Interlaced HDTV: 1080x1920x30x12 = 745 Mb/s!
Use technologies with more bandwidth
• Expensive
• Research
Find ways to reduce the number of bits to transmit
without compromising on the “information content”
• Compression Technology
• Reliable Transmission
Dr. – CS576 Lecture 3 9/20/2021 Page 4
IINNFFOORRMMAATTIIOONN TTHHEEOORRYY
Pioneered primarily by
Information Theory deals with two sub topics
Source Coding – How can a communication system
efficiently transmit the information that a source
produces? Relates to Compression
Channel Coding – How can a communication system
achieve reliable communication over a noisy channel?
Relates to minimizing Transmission Error
Dr. – CS576 Lecture 3 9/20/2021 Page 5
TTYYPPEESS OOFF CCOOMMPPRREESSSSIIOONN
Lossless (entropy coding)
• Does not lose information – the signal is perfectly
reconstructed after decompression
• Produces a variable bit-rate
• It is not guaranteed to actually reduce the data
size
Lossy
• Loses some information – the signal is not
perfectly reconstructed after decompression
• Produces any desired constant bit-rate
Dr. – CS576 Lecture 3 9/20/2021 Page 6
MMOODDEELL IINNFFOORRMMAATTIIOONN AATT TTHHEE SSOOUURRCCEE
Model data at the Source as a Stream of Symbols – This
defines the “Vocabulary” of the source.
Each symbol in the vocabulary is represented by bits
If your vocabulary has N symbols, each symbol
represented with log 2 N bits.
• Speech – 16 bits/sample: N =216 =65,536 symbols
• Color Image: 3×8 bits/sample: N =224 =17×106
symbols
• 8×8 Image Blocks: 8×64 bits/block: N =2512 =1077
symbols
Dr. – CS576 Lecture 3 9/20/2021 Page 7
LLOOSSSSLLEESSSS CCOOMMPPRREESSSSIIOONN
Lossless compression techniques ensure no loss of
data after compression/decompression.
Coding: “Translate” each symbol in the vocabulary into
a “binary codeword”. Codewords may have different
binary lengths.
Example: You have 4 symbols (a, b, c, d). Each in binary
may be represented using 2 bits each, but coded using a
different number of bits.
a (00) 000
b (01) 001
c (10) 01
d (11) 1
Goal of Coding is to minimize the average symbol length
Dr. – CS576 Lecture 3 9/20/2021 Page 8
AAVVEERRAAGGEE SSYYMMBBOOLL LLEENNGGTTHH
Symbol Length l(i) = binary length of i th symbol
Source emits M symbols (one every T seconds)
Symbol i has been emitted m(i) times: M =
=
=
Ni
i
im
1
)(
Number of bits been emitted: L =
=
=
Ni
i
ilim
1
)()(
Average length per symbol: L =
Milim
Ni
i
=
=1
)()(
Average bit rate: L / T
Dr. – CS576 Lecture 3 9/20/2021 Page 9
MMIINNIIMMUUMM AAVVEERRAAGGEE SSYYMMBBOOLL LLEENNGGTTHH
Main goal is to minimize the number of bits being
transmitted mimimize the average symbol length.
Basic idea for reducing the average symbol length:
assign Shorter Codewords to symbols that appear more
frequently, Longer Codewords to symbols that appear
less frequently
Theorem: “the Average Binary Length of the encoded
symbols is always greater than or equal to the entropy H
of the source”
What is the entropy of a source of symbols and how is it
computed?
Dr. – CS576 Lecture 3 9/20/2021 Page 10
SSYYMMBBOOLL PPRROOBBAABBIILLIITTIIEESS
Probability P(i) of a symbol: number of times it can
occur in the transmission (also relative frequency) and
is defined as : P(i) = m(i)/M
From Probability Theory we know
0 P(i) 1 &
=
=
Ni
i
iP
1
)(
= 1
Average symbol length is defined as
Milim
Ni
i
=
=1
)()(
=
=
=
Ni
i
il
M
im
1
)(
)(
=
=
=
Ni
i
iliP
1
)()(
Dr. – CS576 Lecture 3 9/20/2021 Page 11
EENNTTRROOPPYY
Entropy is defined as
=
=
−=
Ni
i
iPiPH
1
2
)(log)(
The entropy is characteristic of a given source of
symbols
H is highest (equal to log 2 N) if all symbols are equally
probable
Entropy is small (always 0) when some symbols that
are much more likely to appear than other symbols
Dr. – CS576 Lecture 3 9/20/2021 Page 12
EENNTTRROOPPYY EEXXAAMMPPLLEE
Dr. – CS576 Lecture 3 9/20/2021 Page 13
TTAAXXOONNOOMMYY OOFF CCOOMMPPRREESSSSIIOONN TTEECCHHNNIIQQUUEESS
Compression techniques are broadly classified into
• Lossless Compression Techniques, also known
as Entropy Coding
• Lossy Compression Techniques
Dr. – CS576 Lecture 3 9/20/2021 Page 14
Repetitive
Sequence
Compression
Statistical
Compression
Prediction Based
Frequency Based
Signal Importance
Hybrid
Compression Techniques
Lossless Lossy
Dictionary
Based
Dr. – CS576 Lecture 3 9/20/2021 Page 15
EEXXAAMMPPLLEESS OOFF EENNTTRROOPPYY EENNCCOODDIINNGG
Types of techniques vary depending on how much you
statistically analyze the signal
Run-length Encoding
Sequence of elements c1, ci… is mapped to a run
– (ci, li) where ci = code and li = length of the i
th run
Repetition Suppression
Repetitive occurrences of a specific character are
replaced by a special flag ab000000000 ab9
Pattern Substitution
A pattern, which occurs frequently, is replaced by
a specific symbol eg LZW
Arithmetic Coding
Dr. – CS576 Lecture 3 9/20/2021 Page 16
LLZZWW –– PPAATTTTEERRNN SSUUBBSSTTIITTUUTTIIOONN
Algorithm :
1. Initialize the dictionary to contain all blocks of
length one ( D={a,b} in the example below ).
2. Search for the longest block W which has
appeared in the dictionary.
3. Encode W by its index in the dictionary.
4. Add W followed by the first symbol of the next
block to the dictionary.
5. Go to Step 2.
Dr. – CS576 Lecture 3 9/20/2021 Page 17
PPAATTTTEERRNN SSUUBBSSTTIITTUUTTIIOONN –– EEXXAAMMPPLLEE
Dr. – CS576 Lecture 3 9/20/2021 Page 18
HHUUFFFFMMAANN CCOODDIINNGG
Huffman code assignment procedure is based on the
frequency of occurrence of a symbol. It uses a binary
tree structure and the algorithm is as follows
• The leaves of the binary tree are associated with the
list of probabilities
• Take the two smallest probabilities in the list and make
the corresponding nodes siblings. Generate an
intermediate node as their parent and label the branch
from parent to one of the child nodes 1 and the branch
from parent to the other child 0.
• Replace the probabilities and associated nodes in the
list by the single new intermediate node with the sum
of the two probabilities. If the list contains only one
element, quit. Otherwise, go to step 2.
Dr. – CS576 Lecture 3 9/20/2021 Page 19
HHUUFFFFMMAANN CCOODDIINNGG ((22))
Codeword formation:
Follow the path from the root of the tree to the
symbol, and accumulates the labels of all the
branches.
Example
N = 8 symbols: {a, b, c, d, e, f, g, h}
3 bits per symbol (N =23 =8)
P(a) = 0.01, P(b)=0.02, P(c)=0.05, P(d)=0.09,
P(e)=0.18, P(f)=0.2, P(g)=0.2, P(h)=0.25
Average length per symbol L = 3)(3
8
1
=
=
=
i
i
iP
Entropy H = 5828.2)(log)(
8
1
2
=−=
=
=
i
i
iPiPH = 0.86L
Dr. – CS576 Lecture 3 9/20/2021 Page 20
HHUUFFFFMMAANN CCOODDIINNGG ((22))
Dr. – CS576 Lecture 3 9/20/2021 Page 21
HHUUFFFFMMAANN CCOODDIINNGG ((33))
Average Length per Symbol (with )
LHuff = 63.225.022.022.0218.0309.0405.0502.0601.06)()(
8
1
=+++++++=
=
=
xxxxxxxxiliP
i
i
Efficiency of the Encoder = H/ LHuff = 98%
Dr. – CS576 Lecture 3 9/20/2021 Page 22
CCCCIITTTT GGRROOUUPP 33 11–DD FFAAXX EENNCCOODDIINNGG
Facsimile: bilevel image, 8.5” x 11” page scanned (left-
to-right) at 200 dpi: 3.74 Mb = (1728 pixels on each line)
Encoding Process
• Count each run of white/black pixels
• Encode the run-lengths with Huffman coding
Probabilities of run-lengths are not the same for black
and white pixels
Decompose each run-length n as n =k x 64 +m (with
m <64), and create Huffman tables for both k and m Special Codewords for end-of-line, end-of-page, synchronization Average compression ratio on text documents: 20-to-1 Dr. – CS576 Lecture 3 9/20/2021 Page 23 AARRIITTHHMMEETTIICC CCOODDIINNGG Map a message of symbols to a real number interval. If you have a vocabulary of M symbols - 1. Divide the interval [0,1) into segments corresponding to the M symbols; the segment of each symbol has a length proportional to its probability. 2. Choose the segment of the first symbol in the string message. 3. Divide the segment of this symbol again into M new segments with length proportional to the symbols probabilities. 4. From these new segments, choose the one corresponding to the next symbol in the message. 5. Continue steps 3) and 4) until the whole message is coded. 6. Represent the segment's value by a binary fraction. Dr. – CS576 Lecture 3 9/20/2021 Page 24 AARRIITTHHMMEETTIICC CCOODDIINNGG ((22)) Suppose you have a vocabulary consisting of two symbols X, Y having prob(X) = 2/3 and prob(Y) = 1/3 If we are only concerned with encoding length 2 messages, then we can map all possible messages to intervals in the range [0,1): To encode message, just send enough bits of a binary fraction that uniquely specifies the interval. Dr. – CS576 Lecture 3 9/20/2021 Page 25 AARRIITTHHMMEETTIICC CCOODDIINNGG ((22)) Here is how code words are formed for the above example with message of length 2. Dr. – CS576 Lecture 3 9/20/2021 Page 26 AARRIITTHHMMEETTIICC CCOODDIINNGG ((33)) Here is how code words are formed for the above example with message of length 3. Dr. – CS576 Lecture 3 9/20/2021 Page 27 LLOOSSSSYY CCOOMMPPRREESSSSIIOONN Decompressed signal is not like original signal – data loss Objective: minimize the distortion for a given compression ratio • Ideally, we would optimize the system based on perceptual distortion (difficult to compute) • We’ll need a few more concepts from statistics… Dr. – CS576 Lecture 3 9/20/2021 Page 28 SSTTAATTIISSTTIICCAALL DDEEFFIINNIITTIIOONNSS y = the original value of a sample ŷ = the sample value after compression/decompression e = y - ŷ = error (or noise or distortion) y 2 = power (variance) of the signal e 2 = power (variance) of the error Signal to Noise Ratio (SNR) = 10 log10 (y 2 e 2) Dr. – CS576 Lecture 3 9/20/2021 Page 29 LLOOSSSSYY CCOOMMPPRREESSSSIIOONN:: SSIIMMPPLLEE EEXXAAMMPPLLEESS Subsampling: retain only samples on a subsampling grid (spatial/temporal). See examples in previous lecture • Compression achieved by reducing the number of samples • Has fundamental limitations: see sampling theorem Quantization: quantize with fewer bits • Compression achieved by reducing the number of bits per sample • As Quantization Interval size increases – compression increases (so does error!) • Scalar Vs Vector Quantization Can we do better than simple quantization and subsampling? Dr. – CS576 Lecture 3 9/20/2021 Page 30 DDIIFFFFEERREENNTTIIAALL PPCCMM If the (n-1)-th sample has value y(n-1), what is the “most probable” value for y(n)? The simplest case: the predicted value for y(n) is y(n-1) Differential PCM Encoding (DPCM): • Don’t quantize and transmit y(n) • Quantize and transmit the residual d(n) = y(n)-y(n-1) Decoding: ŷ(n) = y(n-1) + d ^ (n) where d ^ (n) is the quantized version of d(n) We can show that SNR DPCM > SNR PCM
Dr. – CS576 Lecture 3 9/20/2021 Page 31
DDPPCCMM –– EEXXAAMMPPLLEE
Dr. – CS576 Lecture 3 9/20/2021 Page 32
CCLLOOSSEEDD–LLOOOOPP DDPPCCMM
In DPCM, the decoder reconstructs ŷ(n) = y(n-1)+ d
^
(n)
The decoder must know y(n-1)!
In Closed Loop DPCM, instead of computing the
difference
d(n) = y(n) – y(n-1), we compute
d(n) = y(n) – ŷ (n-1)
Quantize d(n) to get d
^
(n) and transmit d
^
(n)
At the decoder, this implies ŷ(n) = ŷ(n-1) + d
^
(n)
Dr. – CS576 Lecture 3 9/20/2021 Page 33
CCLLOOSSEEDD–LLOOOOPP DDPPCCMM
Dr. – CS576 Lecture 3 9/20/2021 Page 34
TTRRAANNSSFFOORRMM CCOODDIINNGG
In Transform Coding a segment of information
undergoes an invertible mathematical transformation.
Segments of information can be samples (assume M
samples) such as –
• image frame
• 8×8 pixel block (M =64)
• a segment of speech
Transform Coding works as follows –
• Apply a suitable invertible transformation
(typically a matrix multiplication)
x(1), x(2), … ,x(M) y(1), y(2), … ,y(M) (channels)
• Quantize and transmit y(1),…,y(M)
Dr. – CS576 Lecture 3 9/20/2021 Page 35
TTRRAANNSSFFOORRMM CCOODDIINNGG ((22))
Dr. – CS576 Lecture 3 9/20/2021 Page 36
IISSSSUUEESS WWIITTHH TTRRAANNSSFFOORRMM CCOODDIINNGG
Quantizers in different channels may have different
numbers of levels each channel ultimately might yield
a different number of bits.
Bit budget B : if we we quantize y(1),…,y(M) using
B(1),…,B(M) bits respectively, then
=
=
=
Mi
i
iBB
1
)(
Optimal bit allocation: allocate more bits to those
channels that have the highest variance
It can be shown that the transformation increases the
quantization SNR
Transformation also reduces the signal entropy!
Dr. – CS576 Lecture 3 9/20/2021 Page 37
DDIISSCCRREETTEE CCOOSSIINNEE TTRRAANNSSFFOORRMM ((DDCCTT))
The DCT is a kind of transform coding with matrix T
defined as
It has the property that different channels represent the
signal power along different (spatial or temporal)
frequencies similarly to the (discrete) Fourier transform
1/24/2001
Dr. – CS576 Lecture 3 9/20/2021 Page 38
SSUUBBBBAANNDD CCOODDIINNGG
It is a kind of transform coding (although operating on
the whole signal)
It is not implemented as a matrix multiplication but as a
bank of filters followed by decimation (i.e., subsampling)
Again, each channel represents the signal energy
content along different frequencies. Bits can then be
allocated to the coding of an individual subband signal
depending upon the energy within the band
Dr. – CS576 Lecture 3 9/20/2021 Page 39
AA SSIIMMPPLLEE SSCCHHEEMMEE OOFF SSUUBBBBAANNDD CCOODDIINNGG