代写代考 COMP2610/6261 - Information Theory Lecture 16: Arithmetic Coding

COMP2610/6261 – Information Theory Lecture 16: Arithmetic Coding
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
25 September, 2018

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

Interval Codes (Recap)
Shannon-Fano- method: Order the alphabet A.
Represent distribution p by cumulative distribution F
Construct code by finding intervals of width pi that lie in each symbol 2
interval [F(ai−1),F(ai))
F(ai) F(ai)-1⁄2pi F(ai-1)
a1 a2 a3 a4

Intervals and Prefix Codes (Recap)
The set of numbers in [0, 1) that start with a given sequence of bits b = b1 …bn form the interval
0.b1 . . . bn, 0.b1 . . . bn + 2n = [0.b1 . . . bn, 0.b1 . . . bn + 0.0 . . . 1)
This interval contains all binary strings for which b1b2 . . . bn is a prefix
Prefix property (interval form): Once you pick a codeword b1b2 . . . bn, you cannot pick any codeword in the codeword interval
􏰋1􏰃 0.b1b2 …bn,0.b1b2 …bn + 2n

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

Interval Coding Blocks
What if we apply SFE coding to blocks of an ensemble X ? Example: Let A = {aa, ab, ba, bb} with p = (0.2, 0.6, 0.1, 0.1).

Interval Coding Blocks
What if we apply SFE coding to blocks of an ensemble X ?
Example: Let A = {aa, ab, ba, bb} with p = (0.2, 0.6, 0.1, 0.1).
xpF ̄ F ̄2 lCode
aa 0.2 0.1 0.000112 4 0001
ab 0.6 0.5 0.12 2 10
ba 0.1 0.85 0.11011002 5 11011
bb 0.1 0.95 0.1111002 5 11110
Extend to longer sequences

Interval Coding Blocks
What if we apply SFE coding to blocks of an ensemble X ?
Example: Let A = {aa, ab, ba, bb} with p = (0.2, 0.6, 0.1, 0.1). xpF ̄F ̄lCode 1
2 bb 11110
aa 0.2 0.1 0.000112 4
ab 0.6 0.5 0.12 2
ba 0.1 0.85 0.11011002 5
bb 0.1 0.95 0.1111002 5
Extend to longer sequences This works but:
Need P(x) for all x
Total |A|N values for length N Huffman has similar complexity but shorter codes.
0001 ba 11011 10

Arithmetic Coding: A Bird’s Eye View
Basic idea of arithmetic coding follows SFE coding
SFE Coding
Single outcome xi
Find symbol interval for xi Use [F ̄(xi),F ̄(xi))
Binary string corresponding to chosen interval
Output first l(x ) bits of mid- point of interval
Arithmetic coding
Sequence of outcomes x1x2 …xN
Find symbol interval for x1x2 …xN
Binary string corresponding to chosen interval
Output first l(x1 x2 . . . xN ) bits of midpoint of interval

Arithmetic Coding: A Bird’s Eye View
Basic idea of arithmetic coding follows SFE coding
SFE Coding
Single outcome xi
Find symbol interval for xi Use [F(xi−1),F(xi))
Binary string corresponding to chosen interval
Output first l(xi ) bits of mid- point of interval
Arithmetic coding
Sequence of outcomes x1x2 …xN
Find symbol interval for x1x2 …xN
Binary string corresponding to chosen interval
Output first l(x1 x2 . . . xN ) bits of midpoint of interval

Arithmetic Coding: Summary
Arithmetic coding has some important properties:
We do not compute a symbol coding for X and then concatenate
􏰜 Directly work with blocks of size N
We do not explicitly code all length N sequences at once
􏰜 Highly efficient
We do not assume that each of the xi ’s is independent
􏰜 Not restricted to extended ensembles
􏰜 Adapts to data distribution

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

Computing an Interval for Sequences
SayN =2andwewanttocodex1x2
Ideally, we’d like to compute p(x) for all possible x of length 2, and then
find the interval for p(x1x2)
Key ideas:
we can write p(x1x2) = p(x1)p(x2|x1)
􏰜 decompose joint into conditional probabilities
p(·|x1) is just another probability distribution 􏰜 so we can compute intervals as per SFE

Computing an Interval for Sequences
SayN =2andwewanttocodex1x2
Ideally, we’d like to compute p(x) for all possible x of length 2, and then
find the interval for p(x1x2)
Key ideas:
we can write p(x1x2) = p(x1)p(x2|x1)
􏰜 decompose joint into conditional probabilities p(·|x1) is just another probability distribution
􏰜 so we can compute intervals as per SFE
we can find an interval for p(x2|x1) within the interval for x1 􏰜 normal SFE computes the interval within [0, 1) by default

Computing an Interval for Sequences
Example: Suppose A = {a, b, c} and p(a) = 0.25, p(b) = 0.5, p(c) = 0.25 Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:

Computing an Interval for Sequences
Example: Suppose A = {a, b, c} and p(a) = 0.25, p(b) = 0.5, p(c) = 0.25 Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:
So e.g. we treat [0.25, 0.75) as the interval for b

Computing an Interval for Sequences
Suppose the first symbol is b, and p(a|b) = 0.25, p(b|b) = 0.5, p(c|b) = 0.25

Computing an Interval for Sequences
Suppose the first symbol is b, and p(a|b) = 0.25, p(b|b) = 0.5, p(c|b) = 0.25 To code ba, bb, bc, now slice up [0.25, 0.75), the interval for b itself:

Computing an Interval for Sequences
Suppose the first symbol is b, and p(a|b) = 0.25, p(b|b) = 0.5, p(c|b) = 0.25 To code ba, bb, bc, now slice up [0.25, 0.75), the interval for b itself:
0.25 0.3725
For ba we choose the interval of length p(a|b) = 0.25 times the length of the enclosing interval (0.75 − 0.25 = 0.5), i.e. (0.25)(0.5) = 0.125

Arithmetic Coding: End of Stream Symbol
It is convenient to explicitly have a special “end of stream” symbol, P We add this symbol to our ensemble, with some suitable probability
e.g. p(P) = probability of seeing empty string, p(P|a) = probability of seeing just the string a, etc
Implicitly we think of ab as actually being abP
End of stream is by definition reached when we choose the sub-interval for this special symbol

Arithmetic Coding: End of Stream Example
Example: Suppose A = {a, b, c, P} and
p(a) = 0.25, p(b) = 0.5, p(c) = 0.15, p(P) = 0.1
Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:

Arithmetic Coding: End of Stream Example
Now suppose that p(·|b) stays the same as p(·)
If the first symbol is b, we carve the interval for b into four pieces:

Arithmetic Coding: End of Stream Example
Now suppose that p(·|b) stays the same as p(·)
If the first symbol is b, we carve the interval for b into four pieces:
Exact same idea as before, just with special symbol P

Arithmetic Coding for Arbitrary Sequences
These ideas generalise to arbitrary length sequences
We don’t even need to know the sequence length beforehand!
As we see more symbols, we slice the appropriate sub-interval of [0, 1) based on the probabilities
Terminate whenever we see P

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

Arithmetic Coding: Codeword Generation
Once we’ve seen the entire sequence, we end up with interval [u,v) How to output a codeword?

Arithmetic Coding: Codeword Generation
Once we’ve seen the entire sequence, we end up with interval [u,v) How to output a codeword?
As per SFE coding, we can use the first l(x1 x2 . . . xN ) bits of (u + v )/2
Here, l(x1x2 . . . xN ) = ⌈log 1/p(x1x2 . . . xN )⌉ + 1
As before, this guarantees all strings starting with codeword are contained in the codeword interval
Generally, we can output some bits on the fly, rather than wait till we process the entire sequence

Arithmetic Coding: Codeword Generation Example
In previous example with input b, we’d stop in the interval for bP, i.e. [0.7, 0.75)

Arithmetic Coding: Codeword Generation Example
In previous example with input b, we’d stop in the interval for bP, i.e. [0.7, 0.75)
Midpoint is 0.725 = 101110011, and p(bP) = (1/2) · (0.1) = 0.05 Output the first ⌈log2 1/0.05⌉ + 1 = 6 bits, i.e. 101110

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

Arithmetic Coding: Formal Encoder
Formally, we compute the interval [u,v) for a generic sequence as follows: Arithmetic Coding of stream x1x2 . . .
v ← 1.0 p←v−u
for n = 1,2,… ComputeLn(ai|x1,…,xn−1)= ComputeUn(ai|x1,…,xn−1)= v ←u+p·Un(xn|x1,…,xn−1) u ← u + p · Ln (xn |x1 , …, xn−1 ) p←v−u
if xn = P, terminate
i′=1p(xn =ai′|x1,…,xn−1)
i′=1p(xn =ai′|x1,…,xn−1)
Outputfirstl(x1x2 …xN)=⌈log1/p⌉+1bitsof(u+v)/2
Here, Ln , Un just compute the appropriate lower and upper bounds, as per SFE coding We rescale these based on the current interval length

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

How do we decode a sequence of bits?
Rough Sketch:
Carve out [0, 1) based on initial distribution
Keep reading bits until current code interval in a symbol interval Output that symbol
Carve out appropriate interval based on probabilities
We can stop once we have containment in interval for P

Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(P) = 0.125 for every outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a

Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(P) = 0.125 for every outcome in sequence
Decode 0110111:
Sequence Interval (Binary)
0 [0.0, 0.1)2
01 [0.01, 0.10)2
011 [0.011, 0.100)2 0110 [0.0110, 0.0111)2
Interval (Decimal)
[0, 0.5)10
[0.25, 0.5)10 [0.375, 0.5)10 [0.375, 0.4375)10
First symbol a Next symbol c

Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(P) = 0.125 for every outcome in sequence
Decode 0110111:
Sequence Interval (Binary)
0 [0.0, 0.1)2
01 [0.01, 0.10)2
011 [0.011, 0.100)2 0110 [0.0110, 0.0111)2 01101 [0.01101, 0.01110)2
Interval (Decimal)
[0, 0.5)10
[0.25, 0.5)10
[0.375, 0.5)10 [0.375, 0.4375)10 [0.40625, 0.4375)10
First symbol a Next symbol c

Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(P) = 0.125 for every outcome in sequence
Decode 0110111:
Sequence Interval (Binary)
0 [0.0, 0.1)2
01 [0.01, 0.10)2
011 [0.011, 0.100)2
0110 [0.0110, 0.0111)2 01101 [0.01101, 0.01110)2 011011 [0.011011, 0.011100)2
Interval (Decimal)
[0, 0.5)10
[0.25, 0.5)10
[0.375, 0.5)10
[0.375, 0.4375)10 [0.40625, 0.4375)10 [0.421875, 0.4375)10
First symbol a Next symbol c
Next symbol P
The last bit here is actually redundant (inherited from +1 bit in midpoint representation)

1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder Intervals for Sequences Codeword Generation Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly

Adaptive Probabilities
So far we assume the sequence of probabilities are given in advance In Lecture 5, you saw Bernoulli distribution for two outcomes

Adaptive Probabilities
So far we assume the sequence of probabilities are given in advance
In Lecture 5, you saw Bernoulli distribution for two outcomes Beta distribution Beta(θ|mh,mt) as a prior for Bern(x|θ)
The posterior after observing nh heads and nt tails is just Beta(θ|mh +nh,mt +nt)
The expected value of θ under the posterior is
p(x =h|nh,nt,mh,mt)= mh +nh
mh +nh +mt +nt

Dirichlet Model
A Dirichlet distribution is a generalisation of the Beta distribution to more than two outcomes. Its parameter is a vector m = (m1, . . . , mK ) can be viewed as “virtual counts” for each symbol a1, . . . , aK :
P ( x = a i | x 1 . . . x n ) = 􏰐 Kk = 1 ♯ ( a k ) + m k
Can implement an adaptive guesser by just counting symbol occurrences.

Example: Start with mh = mt = 1 and observe sequence hht.
p(·|ε) = (21, 21), p(·|h) = (23, 31), p(·|hh) = (34, 14) p(·|hht) = (53, 25)
viz. Laplace’s Rule, where ε means empty string Why? Because e.g.
p(h|h) = 1 + 1 = 2/3 1+0+1+1
p(t|h) = 0 + 1 = 1/3 1+0+1+1
We’ll assume this learning is only for non P symbols assume P occurs with fixed probability each time

Adaptive Probabilities: Example
Possible outcomes a, b, P
Sequence: ε
Probabilities: p|ε = (0.425, 0.425, 0.15) Encoder: ε
We start off with virtual counts ma = mb = 1
0.011011…
0.110110…

Adaptive Probabilities: Example
Possible outcomes a, b, P
Observations: b
Probabilities: p|b ≈ (0.28, 0.57, 0.15) Encoder Output: ε
Seeing b makes us update p(a|b) = (0.85) · (1/3) ≈ 0.28, and p(b|b) = (0.85) · (2/3) ≈ 0.57. We keep p(P|b) = p(P).
0.011011… 0.100010…
0.110010… 0.110110…

Adaptive Probabilities: Example
Possible outcomes a, b, P
Observations: bb
Probabilities: p|bb ≈ (0.21, 0.64, 0.15) Encoder Output: 1
Seeing bb makes us update p(a|bb) = (0.85) · (1/4) ≈ 0.21, and p(b|bb) = (0.85) · (3/4) ≈ 0.64
Now the first bit is unambiguously 1
bba b bb bbb
0.011011…
0.100010… 0.100110…
0.101111… 0.110010…
0.110110…

Adaptive Probabilities: Example
Possible outcomes a, b, P
Observations: bbb
Probabilities: p|bbb ≈ (0.17, 0.68, 0.15) Encoder Output: 1
0.011011…
0.100010… 0.100110… 0.100111…
0.101111… 0.110010…
0.110110…

Adaptive Probabilities: Example
Possible outcomes a, b, P
Observations: bbba
Probabilities: p|bbba ≈ (0.28, 0.57, 0.15) Encoder Output: 1001
On seeing a, we can fill in three further bits unambiguously
bbba bbbab
0.100110000…
0.100110100…
0.100111100… 0.100111110…

Adaptive Probabilities: Example
Possible outcomes a, b, P
Observations: bbbaP Probabilities: N/A
Encoder Output: 100111101
To terminate, we find midpoint of 0.100111100 . . . and 0.100111110…
bbba bbbab
0.100110000…
0.100110100…
0.100111100… 0.100111110…

Arithmetic Coding: Example (MacKay, Figure 6.4)
00000 0000
00010 0001
00011 00 00100 0010
00110 0011
00111 0 01000 0100
01010 0101
01011 01 01100 0110
01110 0111
bbb bbbb bbbP
10000 1000 ! 10001 100 !
10010 ! 10011 1001
bbbaa bbbab
10010111 10011000 10011001 10011010 10011011 10011100 10011101 10011110
10100 10 1010 ff
10101 101 10110 1011
10111 1 11000 1100
11001 110 11010 1101
11011 11 11100 1110
11101 111 11110 1111 11111
yg 10011111 g10100000
g100111101

Summary and Reading
Main Points
Arithmetic Coding:
􏰜 Uses interval coding and conditional probability
􏰜 Separates coding and prediction
􏰜 No need to compute every code word Predictive distributions:
􏰜 Update distribution after each symbol
􏰜 Beta and Dirichlet priors = virtual counts Reading
Section 6.2 of MacKay

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts