COMP2610/6261 – Information Theory – Lecture 16: Arithmetic Coding
COMP2610/6261 – Information Theory
Lecture 16: Arithmetic Coding
Robert C. Williamson
Research School of Computer Science
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
25 September, 2018
1 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
2 / 37
Interval Codes (Recap)
Shannon-Fano-Elias Coding method:
Order the alphabet A.
Represent distribution p by cumulative distribution F
Construct code by finding intervals of width pi2 that lie in each symbol
interval [F (ai−1),F (ai))
F(ai)
F(ai-1)
F(ai)-½pi
a1 a2 a3 a4
a1
a2
a3
a4
0
1
110
100
01000
0001
3 / 37
Intervals and Prefix Codes (Recap)
The set of numbers in [0, 1) that start with a given sequence of bits
b = b1 . . . bn form the interval
[
0.b1 . . . bn, 0.b1 . . . bn +
1
2n
)
= [0.b1 . . . bn, 0.b1 . . . bn + 0.0 . . . 1)
This interval contains all binary strings for which b1b2 . . . bn is a prefix
Prefix property (interval form): Once you pick a codeword b1b2 . . . bn,
you cannot pick any codeword in the codeword interval[
0.b1b2 . . . bn, 0.b1b2 . . . bn +
1
2n
)
4 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
5 / 37
Interval Coding Blocks
What if we apply SFE coding to blocks of an ensemble X?
Example: Let A = {aa, ab, ba, bb} with p = (0.2, 0.6, 0.1, 0.1).
x p F̄ F̄2 ` Code
aa 0.2 0.1 0.000112 4 0001
ab 0.6 0.5 0.12 2 10
ba 0.1 0.85 0.11011002 5 11011
bb 0.1 0.95 0.1111002 5 11110
Extend to longer sequences
This works but:
Need P(x) for all x
Total |A|N values for length N
Huffman has similar complexity but
shorter codes.
aa
0
1
ab
ba
bb
0.2
0.8
0.9
0001
10
11011
11110
6 / 37
Interval Coding Blocks
What if we apply SFE coding to blocks of an ensemble X?
Example: Let A = {aa, ab, ba, bb} with p = (0.2, 0.6, 0.1, 0.1).
x p F̄ F̄2 ` Code
aa 0.2 0.1 0.000112 4 0001
ab 0.6 0.5 0.12 2 10
ba 0.1 0.85 0.11011002 5 11011
bb 0.1 0.95 0.1111002 5 11110
Extend to longer sequences
This works but:
Need P(x) for all x
Total |A|N values for length N
Huffman has similar complexity but
shorter codes.
aa
0
1
ab
ba
bb
0.2
0.8
0.9
0001
10
11011
11110
6 / 37
Interval Coding Blocks
What if we apply SFE coding to blocks of an ensemble X?
Example: Let A = {aa, ab, ba, bb} with p = (0.2, 0.6, 0.1, 0.1).
x p F̄ F̄2 ` Code
aa 0.2 0.1 0.000112 4 0001
ab 0.6 0.5 0.12 2 10
ba 0.1 0.85 0.11011002 5 11011
bb 0.1 0.95 0.1111002 5 11110
Extend to longer sequences
This works but:
Need P(x) for all x
Total |A|N values for length N
Huffman has similar complexity but
shorter codes.
aa
0
1
ab
ba
bb
0.2
0.8
0.9
0001
10
11011
11110
6 / 37
Arithmetic Coding: A Bird’s Eye View
Basic idea of arithmetic coding follows SFE coding
SFE Coding Arithmetic coding
Input Single outcome xi Sequence of outcomes
x1x2 . . . xN
Key step Find symbol interval for xi Find symbol interval for
x1x2 . . . xN
Use [F̄ (xi), F̄ (xi)) ?
Output Binary string corresponding
to chosen interval
Binary string corresponding
to chosen interval
Output first `(x) bits of mid-
point of interval
Output first `(x1x2 . . . xN)
bits of midpoint of interval
7 / 37
Arithmetic Coding: A Bird’s Eye View
Basic idea of arithmetic coding follows SFE coding
SFE Coding Arithmetic coding
Input Single outcome xi Sequence of outcomes
x1x2 . . . xN
Key step Find symbol interval for xi Find symbol interval for
x1x2 . . . xN
Use [F (xi−1),F (xi)) ?
Output Binary string corresponding
to chosen interval
Binary string corresponding
to chosen interval
Output first `(xi) bits of mid-
point of interval
Output first `(x1x2 . . . xN)
bits of midpoint of interval
8 / 37
Arithmetic Coding: Summary
Arithmetic coding has some important properties:
We do not compute a symbol coding for X and then concatenate
I Directly work with blocks of size N
We do not explicitly code all length N sequences at once
I Highly efficient
We do not assume that each of the xi ’s is independent
I Not restricted to extended ensembles
I Adapts to data distribution
9 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
10 / 37
Computing an Interval for Sequences
Say N = 2 and we want to code x1x2
Ideally, we’d like to compute p(x) for all possible x of length 2, and then
find the interval for p(x1x2)
Key ideas:
we can write p(x1x2) = p(x1)p(x2|x1)
I decompose joint into conditional probabilities
p(·|x1) is just another probability distribution
I so we can compute intervals as per SFE
we can find an interval for p(x2|x1) within the interval for x1
I normal SFE computes the interval within [0, 1) by default
11 / 37
Computing an Interval for Sequences
Say N = 2 and we want to code x1x2
Ideally, we’d like to compute p(x) for all possible x of length 2, and then
find the interval for p(x1x2)
Key ideas:
we can write p(x1x2) = p(x1)p(x2|x1)
I decompose joint into conditional probabilities
p(·|x1) is just another probability distribution
I so we can compute intervals as per SFE
we can find an interval for p(x2|x1) within the interval for x1
I normal SFE computes the interval within [0, 1) by default
11 / 37
Computing an Interval for Sequences
Say N = 2 and we want to code x1x2
Ideally, we’d like to compute p(x) for all possible x of length 2, and then
find the interval for p(x1x2)
Key ideas:
we can write p(x1x2) = p(x1)p(x2|x1)
I decompose joint into conditional probabilities
p(·|x1) is just another probability distribution
I so we can compute intervals as per SFE
we can find an interval for p(x2|x1) within the interval for x1
I normal SFE computes the interval within [0, 1) by default
11 / 37
Computing an Interval for Sequences
Example: Suppose A = {a, b, c} and p(a) = 0.25, p(b) = 0.5, p(c) = 0.25
Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:
0
0.25
0.75
1
a
b
c
So e.g. we treat [0.25, 0.75) as the interval for b
12 / 37
Computing an Interval for Sequences
Example: Suppose A = {a, b, c} and p(a) = 0.25, p(b) = 0.5, p(c) = 0.25
Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:
0
0.25
0.75
1
a
b
c
So e.g. we treat [0.25, 0.75) as the interval for b
12 / 37
Computing an Interval for Sequences
Suppose the first symbol is b, and p(a|b) = 0.25, p(b|b) = 0.5, p(c|b) = 0.25
To code ba, bb, bc, now slice up [0.25, 0.75), the interval for b itself:
0
0.25
0.75
0.3725
0.625
1
a
b
c
ba
bb
bc
For ba we choose the interval of length p(a|b) = 0.25 times the length of the
enclosing interval (0.75− 0.25 = 0.5), i.e. (0.25)(0.5) = 0.125
13 / 37
Computing an Interval for Sequences
Suppose the first symbol is b, and p(a|b) = 0.25, p(b|b) = 0.5, p(c|b) = 0.25
To code ba, bb, bc, now slice up [0.25, 0.75), the interval for b itself:
0
0.25
0.75
0.3725
0.625
1
a
b
c
ba
bb
bc
For ba we choose the interval of length p(a|b) = 0.25 times the length of the
enclosing interval (0.75− 0.25 = 0.5), i.e. (0.25)(0.5) = 0.125
13 / 37
Computing an Interval for Sequences
Suppose the first symbol is b, and p(a|b) = 0.25, p(b|b) = 0.5, p(c|b) = 0.25
To code ba, bb, bc, now slice up [0.25, 0.75), the interval for b itself:
0
0.25
0.75
0.3725
0.625
1
a
b
c
ba
bb
bc
For ba we choose the interval of length p(a|b) = 0.25 times the length of the
enclosing interval (0.75− 0.25 = 0.5), i.e. (0.25)(0.5) = 0.125
13 / 37
Arithmetic Coding: End of Stream Symbol
It is convenient to explicitly have a special “end of stream” symbol, 2
We add this symbol to our ensemble, with some suitable probability
e.g. p(2) = probability of seeing empty string, p(2|a) = probability of
seeing just the string a, etc
Implicitly we think of ab as actually being ab2
End of stream is by definition reached when we choose the sub-interval for
this special symbol
14 / 37
Arithmetic Coding: End of Stream Example
Example: Suppose A = {a, b, c,2} and
p(a) = 0.25, p(b) = 0.5, p(c) = 0.15, p(2) = 0.1
Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:
0
0.25
0.75
1
a
b
c
� 0.9
15 / 37
Arithmetic Coding: End of Stream Example
Example: Suppose A = {a, b, c,2} and
p(a) = 0.25, p(b) = 0.5, p(c) = 0.15, p(2) = 0.1
Like with SFE coding, we’d begin by slicing up [0, 1) into three subintervals:
0
0.25
0.75
1
a
b
c
� 0.9
15 / 37
Arithmetic Coding: End of Stream Example
Now suppose that p(·|b) stays the same as p(·)
If the first symbol is b, we carve the interval for b into four pieces:
0
0.25
0.75
1
a
b
c
� 0.9
ba
bb
bc
b�
0.3725
0.625
0.7
Exact same idea as before, just with special symbol 2
16 / 37
Arithmetic Coding: End of Stream Example
Now suppose that p(·|b) stays the same as p(·)
If the first symbol is b, we carve the interval for b into four pieces:
0
0.25
0.75
1
a
b
c
� 0.9
ba
bb
bc
b�
0.3725
0.625
0.7
Exact same idea as before, just with special symbol 2
16 / 37
Arithmetic Coding for Arbitrary Sequences
These ideas generalise to arbitrary length sequences
We don’t even need to know the sequence length beforehand!
As we see more symbols, we slice the appropriate sub-interval of [0, 1)
based on the probabilities
Terminate whenever we see 2
17 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
18 / 37
Arithmetic Coding: Codeword Generation
Once we’ve seen the entire sequence, we end up with interval [u, v)
How to output a codeword?
As per SFE coding, we can use the first `(x1x2 . . . xN) bits of (u + v)/2
Here, `(x1x2 . . . xN) = dlog 1/p(x1x2 . . . xN)e+ 1
As before, this guarantees all strings starting with codeword are
contained in the codeword interval
Generally, we can output some bits on the fly, rather than wait till we
process the entire sequence
19 / 37
Arithmetic Coding: Codeword Generation
Once we’ve seen the entire sequence, we end up with interval [u, v)
How to output a codeword?
As per SFE coding, we can use the first `(x1x2 . . . xN) bits of (u + v)/2
Here, `(x1x2 . . . xN) = dlog 1/p(x1x2 . . . xN)e+ 1
As before, this guarantees all strings starting with codeword are
contained in the codeword interval
Generally, we can output some bits on the fly, rather than wait till we
process the entire sequence
19 / 37
Arithmetic Coding: Codeword Generation Example
In previous example with input b, we’d stop in the interval for b2, i.e. [0.7, 0.75)
0
0.25
0.75
1
a
b
c
� 0.9
ba
bb
bc
b�
0.3725
0.625
0.7
Midpoint is 0.725 = 101110011, and p(b2) = (1/2) · (0.1) = 0.05
Output the first dlog2 1/0.05e+ 1 = 6 bits, i.e. 101110
20 / 37
Arithmetic Coding: Codeword Generation Example
In previous example with input b, we’d stop in the interval for b2, i.e. [0.7, 0.75)
0
0.25
0.75
1
a
b
c
� 0.9
ba
bb
bc
b�
0.3725
0.625
0.7
Midpoint is 0.725 = 101110011, and p(b2) = (1/2) · (0.1) = 0.05
Output the first dlog2 1/0.05e+ 1 = 6 bits, i.e. 101110
20 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
21 / 37
Arithmetic Coding: Formal Encoder
Formally, we compute the interval [u, v) for a generic sequence as follows:
Arithmetic Coding of stream x1x2 . . .
u ← 0.0
v ← 1.0
p ← v − u
for n = 1, 2, . . .
Compute Ln(ai |x1, …, xn−1) =
∑i−1
i′=1 p(xn = ai′ |x1, . . . , xn−1)
Compute Un(ai |x1, …, xn−1) =
∑i
i′=1 p(xn = ai′ |x1, . . . , xn−1)
v ← u + p · Un(xn|x1, …, xn−1)
u ← u + p · Ln(xn|x1, …, xn−1)
p ← v − u
if xn = 2, terminate
Output first `(x1x2 . . . xN) = dlog 1/pe+ 1 bits of (u + v)/2
Here, Ln,Un just compute the appropriate lower and upper bounds, as per SFE coding
We rescale these based on the current interval length
22 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
23 / 37
Decoding
How do we decode a sequence of bits?
Rough Sketch:
Carve out [0, 1) based on initial distribution
Keep reading bits until current code interval in a symbol interval
Output that symbol
Carve out appropriate interval based on probabilities
…
We can stop once we have containment in interval for 2
24 / 37
Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(2) = 0.125 for every
outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a
01 [0.01, 0.10)2 [0.25, 0.5)10
011 [0.011, 0.100)2 [0.375, 0.5)10 Next symbol c
0110 [0.0110, 0.0111)2 [0.375, 0.4375)10
01101 [0.01101, 0.01110)2 [0.40625, 0.4375)10
011011 [0.011011, 0.011100)2 [0.421875, 0.4375)10 Next symbol 2
The last bit here is actually redundant (inherited from +1 bit in midpoint
representation)
25 / 37
Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(2) = 0.125 for every
outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a
01 [0.01, 0.10)2 [0.25, 0.5)10
011 [0.011, 0.100)2 [0.375, 0.5)10 Next symbol c
0110 [0.0110, 0.0111)2 [0.375, 0.4375)10
01101 [0.01101, 0.01110)2 [0.40625, 0.4375)10
011011 [0.011011, 0.011100)2 [0.421875, 0.4375)10 Next symbol 2
The last bit here is actually redundant (inherited from +1 bit in midpoint
representation)
25 / 37
Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(2) = 0.125 for every
outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a
01 [0.01, 0.10)2 [0.25, 0.5)10
011 [0.011, 0.100)2 [0.375, 0.5)10 Next symbol c
0110 [0.0110, 0.0111)2 [0.375, 0.4375)10
01101 [0.01101, 0.01110)2 [0.40625, 0.4375)10
011011 [0.011011, 0.011100)2 [0.421875, 0.4375)10 Next symbol 2
The last bit here is actually redundant (inherited from +1 bit in midpoint
representation)
25 / 37
Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(2) = 0.125 for every
outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a
01 [0.01, 0.10)2 [0.25, 0.5)10
011 [0.011, 0.100)2 [0.375, 0.5)10 Next symbol c
0110 [0.0110, 0.0111)2 [0.375, 0.4375)10
01101 [0.01101, 0.01110)2 [0.40625, 0.4375)10
011011 [0.011011, 0.011100)2 [0.421875, 0.4375)10 Next symbol 2
The last bit here is actually redundant (inherited from +1 bit in midpoint
representation)
25 / 37
Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(2) = 0.125 for every
outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a
01 [0.01, 0.10)2 [0.25, 0.5)10
011 [0.011, 0.100)2 [0.375, 0.5)10 Next symbol c
0110 [0.0110, 0.0111)2 [0.375, 0.4375)10
01101 [0.01101, 0.01110)2 [0.40625, 0.4375)10
011011 [0.011011, 0.011100)2 [0.421875, 0.4375)10 Next symbol 2
The last bit here is actually redundant (inherited from +1 bit in midpoint
representation)
25 / 37
Decoding: Example
Suppose p(a) = 0.5, p(b) = 0.125, p(c) = 0.25, p(2) = 0.125 for every
outcome in sequence
Decode 0110111:
Sequence Interval (Binary) Interval (Decimal) Comment
0 [0.0, 0.1)2 [0, 0.5)10 First symbol a
01 [0.01, 0.10)2 [0.25, 0.5)10
011 [0.011, 0.100)2 [0.375, 0.5)10 Next symbol c
0110 [0.0110, 0.0111)2 [0.375, 0.4375)10
01101 [0.01101, 0.01110)2 [0.40625, 0.4375)10
011011 [0.011011, 0.011100)2 [0.421875, 0.4375)10 Next symbol 2
The last bit here is actually redundant (inherited from +1 bit in midpoint
representation)
25 / 37
1 From SFE to Arithmetic Coding
2 Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
3 Arithmetic Coding: Decoder
4 Adapting Distributions On-The-Fly
26 / 37
Adaptive Probabilities
So far we assume the sequence of probabilities are given in advance
In Lecture 5, you saw Bernoulli distribution for two outcomes
Beta distribution Beta(θ|mh,mt ) as a prior for Bern(x |θ)
The posterior after observing nh heads and nt tails is just
Beta(θ|mh + nh,mt + nt )
The expected value of θ under the posterior is
p(x = h|nh, nt ,mh,mt ) =
mh + nh
mh + nh + mt + nt
27 / 37
Adaptive Probabilities
So far we assume the sequence of probabilities are given in advance
In Lecture 5, you saw Bernoulli distribution for two outcomes
Beta distribution Beta(θ|mh,mt ) as a prior for Bern(x |θ)
The posterior after observing nh heads and nt tails is just
Beta(θ|mh + nh,mt + nt )
The expected value of θ under the posterior is
p(x = h|nh, nt ,mh,mt ) =
mh + nh
mh + nh + mt + nt
27 / 37
Dirichlet Model
A Dirichlet distribution is a generalisation of the Beta distribution to more
than two outcomes. Its parameter is a vector m = (m1, . . . ,mK ) can be
viewed as “virtual counts” for each symbol a1, . . . , aK :
P(x = ai |x1 . . . xn) =
](ai) + mi∑K
k=1 ](ak ) + mk
Can implement an adaptive guesser by just counting symbol occurrences.
Flexible
e.g., Choose m to be frequency of English letters
∑
k mk Large = Stable; Small = Responsive
28 / 37
Dirichlet Model
A Dirichlet distribution is a generalisation of the Beta distribution to more
than two outcomes. Its parameter is a vector m = (m1, . . . ,mK ) can be
viewed as “virtual counts” for each symbol a1, . . . , aK :
P(x = ai |x1 . . . xn) =
](ai) + mi∑K
k=1 ](ak ) + mk
Can implement an adaptive guesser by just counting symbol occurrences.
Flexible
e.g., Choose m to be frequency of English letters
∑
k mk Large = Stable; Small = Responsive
28 / 37
title
Example: Start with mh = mt = 1 and observe sequence hht.
p(·|�) = ( 12 ,
1
2 ), p(·|h) = (
2
3 ,
1
3 ), p(·|hh) = (
3
4 ,
1
4 ) p(·|hht) = (
3
5 ,
2
5 )
viz. Laplace’s Rule, where � means empty string
Why? Because e.g.
p(h|h) =
1 + 1
1 + 0 + 1 + 1
= 2/3
p(t|h) =
0 + 1
1 + 0 + 1 + 1
= 1/3
We’ll assume this learning is only for non 2 symbols
assume 2 occurs with fixed probability each time
29 / 37
Adaptive Probabilities: Example
Possible outcomes a, b,2
Sequence: �
Probabilities: p|� = (0.425, 0.425, 0.15)
Encoder: �
a
b
☐
0.011011…
0.110110…
We start off with virtual counts ma = mb = 1
30 / 37
Adaptive Probabilities: Example
Possible outcomes a, b,2
Observations: b
Probabilities: p|b ≈ (0.28, 0.57, 0.15)
Encoder Output: �
a
b
☐
ba
bb
b☐
0.011011…
0.110110…
0.100010…
0.110010…
Seeing b makes us update p(a|b) = (0.85) · (1/3) ≈ 0.28, and
p(b|b) = (0.85) · (2/3) ≈ 0.57. We keep p(2|b) = p(2).
31 / 37
Adaptive Probabilities: Example
Possible outcomes a, b,2
Observations: bb
Probabilities: p|bb ≈ (0.21, 0.64, 0.15)
Encoder Output: 1
a
b
☐
ba
bb
b☐
bba
bbb
bb☐
0.011011…
0.110110…
0.100010…
0.110010…
0.100110…
0.101111…
Seeing bb makes us update p(a|bb) = (0.85) · (1/4) ≈ 0.21, and
p(b|bb) = (0.85) · (3/4) ≈ 0.64
Now the first bit is unambiguously 1
32 / 37
Adaptive Probabilities: Example
Possible outcomes a, b,2
Observations: bbb
Probabilities: p|bbb ≈ (0.17, 0.68, 0.15)
Encoder Output: 1
a
b
☐
ba
bb
b☐
bba
bbb
bb☐
bbba
0.011011…
0.110110…
0.100010…
0.110010…
0.100110…
0.101111…
0.100111…
33 / 37
Adaptive Probabilities: Example
Possible outcomes a, b,2
Observations: bbba
Probabilities: p|bbba ≈ (0.28, 0.57, 0.15)
Encoder Output: 1001
bbba
bbba☐
bbbab
bbbaa
0.100110000…
0.100111110…
0.100110100…
0.100111100…
On seeing a, we can fill in three further bits unambiguously
34 / 37
Adaptive Probabilities: Example
Possible outcomes a, b,2
Observations: bbba2
Probabilities: N/A
Encoder Output: 100111101
bbba
bbba☐
bbbab
bbbaa
0.100110000…
0.100111110…
0.100110100…
0.100111100…
To terminate, we find midpoint of 0.100111100 . . . and 0.100111110…
35 / 37
Arithmetic Coding: Example (MacKay, Figure 6.4)
a
b
!
ba
bb
b!
bba
bbb
bb!
bbba
bbbb
bbb!
0
1
00
01
000
001
0000
0001
00000
00001
00010
00011
0010
0011
00100
00101
00110
00111
010
011
0100
0101
01000
01001
01010
01011
0110
0111
01100
01101
01110
01111
10
11
100
101
1000
1001
10000
10001
10010
10011
1010
1011
10100
10101
10110
10111
110
111
1100
1101
11000
11001
11010
11011
1110
1111
11100
11101
11110
11111
!
!
!
!
”
”
”
”
100111101
#
#
##$
bbba
bbbaa
bbbab
bbba!
10011
10010111
10011000
10011001
10011010
10011011
10011100
10011101
10011110
10011111
10100000
36 / 37
Summary and Reading
Main Points
Arithmetic Coding:
I Uses interval coding and conditional probability
I Separates coding and prediction
I No need to compute every code word
Predictive distributions:
I Update distribution after each symbol
I Beta and Dirichlet priors = virtual counts
Reading
Section 6.2 of MacKay
37 / 37
From SFE to Arithmetic Coding
Arithmetic Coding: Encoder
Intervals for Sequences
Codeword Generation
Putting it all together
Arithmetic Coding: Decoder
Adapting Distributions On-The-Fly