程序代写代做代考 information theory AI COMP2610 / COMP6261 - Information Theory - Lecture 10: Typicality and Asymptotic Equipartition Property

COMP2610 / COMP6261 – Information Theory – Lecture 10: Typicality and Asymptotic Equipartition Property

COMP2610 / COMP6261 – Information Theory
Lecture 10: Typicality and Asymptotic Equipartition Property

Robert C. Williamson

Research School of Computer Science

1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y

ANU Logo Use Guidelines

Deep Gold
C30 M50 Y70 K40

PMS Metallic 8620

PMS 463

Black
C0 M0 Y0 K100

PMS Process Black

Preferred logo Black version

Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact

brand@anu.edu.au

The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:

First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.

Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.

Black
Where colour printing is not available, the black logo can
be used on a white background.

Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.

21 August, 2018

1 / 31

Last time

Markov’s inequality

Chebyshev’s inequality

Law of large numbers

2 / 31

Law of Large Numbers

Theorem
Let X1, . . . ,Xn be a sequence of iid random variables, with

E[Xi ] = µ

and V[Xi ] <∞. Define X̄n = X1 + . . .+ Xn n . Then, for any β > 0,

lim
n→∞

p(|X̄n − µ| < β) = 1. This is also called X̄n → µ in probability. Definition: For random variables v1, v2, . . ., we say vn → v in probability if for all β > 0 limn→∞ P(|vn − v | > β) = 0.

β is fixed (not shrinking like 1n ). Not max/min. Reduction in variability.
3 / 31

This time

Ensembles and sequences

Typical sets

Asymptotic Equipartition Property (AEP)

4 / 31

1 Ensembles and sequences
Counting Types of Sequences

2 Typical sets

3 Asymptotic Equipartition Property (AEP)

4 Wrapping Up

5 / 31

Ensembles

Ensemble
An ensemble X is a triple (x ,AX ,PX ); x is a random variable taking
values in AX = {a1, a2, . . . , aI} with probabilities PX = {p1, p2, . . . , pI}.

We will call AX the alphabet of the ensemble

6 / 31

Ensembles
Example: Bent Coin

Let X be an ensemble with outcomes h for
heads with probability 0.9 and t for tails
with probability 0.1.

The outcome set is AX = {h, t}
The probabilities are
PX = {ph = 0.9, pt = 0.1}

7 / 31

Extended Ensembles

We can also consider blocks of outcomes, which will be useful to describe
sequences:

Example (Coin Flips):

hhhhthhththh→ hh hh th ht ht hh (6 × 2 outcome blocks)
→ hhh hth hth thh (4 × 3 outcome blocks)
→ hhhh thht hthh (3 × 4 outcome blocks)

Extended Ensemble
Let X be a single ensemble. The extended ensemble of blocks of size N
is denoted X N . Outcomes from X N are denoted x = (x1, x2, . . . , xN). The
probability of x is defined to be P(x) = P(x1)P(x2) . . .P(xN).

8 / 31

Extended Ensembles

We can also consider blocks of outcomes, which will be useful to describe
sequences:

Example (Coin Flips):

hhhhthhththh→ hh hh th ht ht hh (6 × 2 outcome blocks)
→ hhh hth hth thh (4 × 3 outcome blocks)
→ hhhh thht hthh (3 × 4 outcome blocks)

8 / 31

Extended Ensembles
Example: Bent Coin

Let X be an ensemble with outcomes
AX = {h, t} with ph = 0.9 and pt = 0.1.

Consider X 4 – i.e., 4 flips of the coin.

AX 4 = {hhhh, hhht, hhth, . . . , tttt}

P(hhhh) = (0.9)4 ≈ 0.6561
P(tttt) = (0.1)4 = 0.0001

P(hthh) = 0.9 · 0.1 · 0.9 · 0.9 = (0.9)3(0.1) ≈ 0.0729
P(htht) = 0.9 · 0.1 · 0.9 · 0.1 = (0.9)2(0.1)2 ≈ 0.0081.

9 / 31

Extended Ensembles
Example: Bent Coin

Let X be an ensemble with outcomes
AX = {h, t} with ph = 0.9 and pt = 0.1.

Consider X 4 – i.e., 4 flips of the coin.

AX 4 = {hhhh, hhht, hhth, . . . , tttt}

P(hhhh) = (0.9)4 ≈ 0.6561
P(tttt) = (0.1)4 = 0.0001

P(hthh) = 0.9 · 0.1 · 0.9 · 0.9 = (0.9)3(0.1) ≈ 0.0729
P(htht) = 0.9 · 0.1 · 0.9 · 0.1 = (0.9)2(0.1)2 ≈ 0.0081.

9 / 31

Extended Ensembles
Example: Bent Coin

Entropy of extended ensembles

We can view X 4 as comprising 4 independent random variables, based on
the ensemble X

Entropy is additive for independent random variables

Thus,

H(X 4) = 4H(X ) = 4. (−0.9 log2 0.9− 0.1 log2 0.1) = 1.88bits.

More generally,
H(X N) = NH(X ).

10 / 31

Counting Types of Sequences

Criteria for dividing 2N sequences into types

In the bent coin example,

(0.9)2(0.1)2 = P(hhtt)

= P(htht)

= P(htth)

= P(thht)

= P(thth)

= P(tthh).

The order of outcomes in the sequence is irrelevant

11 / 31

Counting Types of Sequences

Let X be an ensemble with alphabet AX = {a1, . . . , aI}.

Let p(X = ai) = pi .

For a sequence x = x1, x2, . . . , xN , how to compute p(x)?

let ni = # of times symbol ai appears in x (symbol count)

Given the ni ’s, we can compute the probability of seeing x:

P(x) = P(x1) · P(x2) · . . . · P(xN)
= P(a1)

n1 · P(a2)n2 · . . . · P(aI)nI
= pn11 · p

n2
2 . . . p

nI
I

Sufficient statistics: {n1, n2, . . . , nI}. Use it as a criteria of partitioning.

12 / 31

Counting Types of Sequences
Sequence Types

Each unique choice of (n1, n2, . . . , nI) gives a different type of sequence
4 heads, (3 heads, 1 tail), (2 heads, 2 tails), …
Sequences in each type are equiprobable.

For a given type of sequence how many sequences are there with these
symbol counts?

# of sequences with ni copies of ai =
N!

n1!n2! . . . nI!

(
N
n1

)(
N − n1

)(
N − n1 − n2

)
. . .

=
N!

n1!(N − n1)!
· (N − n1)!

n2!(N − n1 − n2)!
· (N − n1 − n2)!

n3!(N − n1 − n2 − n3)!
. . .

13 / 31

Counting Types of Sequences
Example

Probability of types

Let A = {a, b, c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.

Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability
(0.2)2(0.3)1(0.5)3 = 0.0015.

There are 6!2!1!3! = 60 such sequences.

The probability x is of type (2, 1, 3) is (0.0015) · 60 = 0.09.

Study probabilities at the level of types (most likely, average/typical)

14 / 31

Counting Types of Sequences
Example

Probability of types

Let A = {a, b, c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.

Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability
(0.2)2(0.3)1(0.5)3 = 0.0015.

There are 6!2!1!3! = 60 such sequences.

The probability x is of type (2, 1, 3) is (0.0015) · 60 = 0.09.

Study probabilities at the level of types (most likely, average/typical)

14 / 31

Counting Types of Sequences
Example

Probability of types

Let A = {a, b, c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.

Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability
(0.2)2(0.3)1(0.5)3 = 0.0015.

There are 6!2!1!3! = 60 such sequences.

The probability x is of type (2, 1, 3) is (0.0015) · 60 = 0.09.

Study probabilities at the level of types (most likely, average/typical)

14 / 31

Counting Types of Sequences
Example

Probability of types

Let A = {a, b, c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.

Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability
(0.2)2(0.3)1(0.5)3 = 0.0015.

There are 6!2!1!3! = 60 such sequences.

The probability x is of type (2, 1, 3) is (0.0015) · 60 = 0.09.

Study probabilities at the level of types (most likely, average/typical)

14 / 31

1 Ensembles and sequences
Counting Types of Sequences

2 Typical sets

3 Asymptotic Equipartition Property (AEP)

4 Wrapping Up

15 / 31

Extended Ensembles
Example

With ph = 0.75, what are the probabilities for X N?

N = 2

x P(x)
hh 0.5625
ht 0.1875
th 0.1875
tt 0.0625

N = 3

x P(x)
hhh 0.4219
hht 0.1406
hth 0.1406
thh 0.1406
htt 0.0469
tht 0.0469
tth 0.0469
ttt 0.0156

N = 4

x P(x) x P(x)
hhhh 0.3164 thht 0.0352
hhht 0.1055 thth 0.0352
hhth 0.1055 tthh 0.0352
hthh 0.1055 httt 0.0117
thhh 0.1055 thtt 0.0117
htht 0.0352 ttht 0.0117
htth 0.0352 ttth 0.0117
hhtt 0.0352 tttt 0.0039

16 / 31

Extended Ensembles
Example

With ph = 0.75, what are the probabilities for X N?

N = 2

x P(x)
hh 0.5625
ht 0.1875
th 0.1875
tt 0.0625

N = 3

x P(x)
hhh 0.4219
hht 0.1406
hth 0.1406
thh 0.1406
htt 0.0469
tht 0.0469
tth 0.0469
ttt 0.0156

N = 4

16 / 31

Extended Ensembles
Example

With ph = 0.75, what are the probabilities for X N?

N = 2

x P(x)
hh 0.5625
ht 0.1875
th 0.1875
tt 0.0625

N = 3

x P(x)
hhh 0.4219
hht 0.1406
hth 0.1406
thh 0.1406
htt 0.0469
tht 0.0469
tth 0.0469
ttt 0.0156

N = 4

16 / 31

Observations

As N increases, there is an increasing spread of probabilities

The most likely single sequence will always be the all h’s

However, for N = 4, the most likely sequence type is 3 h’s and 1 t

Not surprising because 3 = N · ph, pretty much average case.

17 / 31

Symbol Frequency in Long Sequences

To judge if a sequence is typical/average, a natural question to ask is:

How often does each symbol appear in a sequence x from X N?

Intuitively, in a sequence of length N, let ai appear for ni times.
Then in expectation

ni ≈ N · P(ai)

Note pi = P(ai), and

P(x) = P(a1)
n1P(a2)

n2 . . .P(aI)
nI ≈ pNp11 p

Np2
2 . . . p

NpI
I

So the information content − log2 P(x) of that sequence is approximately

−p1N log2 p1 − . . .− pIN log2 pI = −N
I∑

i=1

pi log2 pi = NH(X )

18 / 31

Symbol Frequency in Long Sequences

To judge if a sequence is typical/average, a natural question to ask is:

How often does each symbol appear in a sequence x from X N?

Intuitively, in a sequence of length N, let ai appear for ni times.
Then in expectation

ni ≈ N · P(ai)

Note pi = P(ai), and

P(x) = P(a1)
n1P(a2)

n2 . . .P(aI)
nI ≈ pNp11 p

Np2
2 . . . p

NpI
I

So the information content − log2 P(x) of that sequence is approximately

−p1N log2 p1 − . . .− pIN log2 pI = −N
I∑

i=1

pi log2 pi = NH(X )

18 / 31

Typical Sets

We want to consider elements x that have − log2 P(x) “close” to NH(X )

Typical Set

For “closeness” β > 0 the typical set TNβ for X N is

TNβ
def
= {x : |− log2 P(x)− NH(X )| < Nβ} = { x : ∣∣∣∣− 1N log2 P(x)− H(X ) ∣∣∣∣ < β } Union of types 19 / 31 Typical Sets We want to consider elements x that have − log2 P(x) “close” to NH(X ) Typical Set For “closeness” β > 0 the typical set TNβ for X N is

TNβ
def
= {x : |− log2 P(x)− NH(X )| < Nβ} = { x : ∣∣∣∣− 1N log2 P(x)− H(X ) ∣∣∣∣ < β } Union of types ✦ ❧�❣✷ P✭①✮ ✁◆❍✭❳✮ ❚✂✄ What when β = 0 (and replace < by ≤)? Criterion based on information content. Other criterion (KL divergence)? 19 / 31 Typical Sets The name “typical” is used since x ∈ TNβ will have roughly p1N occurrences of symbol a1, p2N of a2, . . ., pK N of aK . x log2(P (x)) ...1...................1.....1....1.1.......1........1...........1.....................1.......11... !50.1 ......................1.....1.....1.......1....1.........1.....................................1.... !37.3 ........1....1..1...1....11..1.1.........11.........................1...1.1..1...1................1. !65.9 1.1...1................1.......................11.1..1............................1.....1..1.11..... !56.4 ...11...........1...1.....1.1......1..........1....1...1.....1............1......................... !53.2 ..............1......1.........1.1.......1..........1............1...1......................1....... !43.7 .....1........1.......1...1............1............1...........1......1..11........................ !46.8 .....1..1..1...............111...................1...............1.........1.1...1...1.............1 !56.4 .........1..........1.....1......1..........1....1..............................................1... !37.3 ......1........................1..............1.....1..1.1.1..1...................................1. !43.7 1.......................1..........1...1...................1....1....1........1..11..1.1...1........ !56.4 ...........11.1.........1................1......1.....................1............................. !37.3 .1..........1...1.1.............1.......11...........1.1...1..............1.............11.......... !56.4 ......1...1..1.....1..11.1.1.1...1.....................1............1.............1..1.............. !59.5 ............11.1......1....1..1............................1.......1..............1.......1......... !46.8 .................................................................................................... !15.2 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 !332.1 Randomly drawn sequences for P(1) = 0.1. Note: H(X ) ≈ 0.47 20 / 31 Typical Sets Properties Typical sequences are nearly equiprobable: Every x ∈ TNβ has 2−N(H(X)+β) ≤ P(x) ≤ 2−N(H(X)−β). Variation is small when β is small Number of sequences in the typical set: For any N, β, |TNβ| ≤ 2N(H(X)+β). 21 / 31 Typical Sets Proof of Cardinality Bound For every x ∈ TNβ , p(x) ≥ 2−N(H(X)−β). Thus, 1 = ∑ x p(x) ≥ ∑ x∈TNβ p(x) ≥ ∑ x∈TNβ 2−N(H(X)−β) = 2−N(H(X)−β) · |TNβ|. Thus |TNβ| ≤ 2N(H(X)+β) 22 / 31 Typical Sets Most Likely Sequence The most likely sequence may not belong to the typical set e.g. with ph = 0.75, we have −1 4 log2 P(hhhh) = 0.4150 whereas H(X ) = 0.8113 The most likely single sequence→ hhhh The most likely single sequence type→ {hhht, hthh, . . .} 23 / 31 Typical Sets Most Likely Sequence Probability of most likely sequence decays like (ph)N (ph = 0.75) Sequences with N · ph heads contain much more total probability mass 0 50 100 150 200 250 300 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Block Size (N) T o ta l P ro b a b il it y Type (3N/4, N/4) sequences Type (N, 0) sequences Blue curve corresponds to typical set with β = 0. What if β > 0?
24 / 31

1 Ensembles and sequences
Counting Types of Sequences

2 Typical sets

3 Asymptotic Equipartition Property (AEP)

4 Wrapping Up

25 / 31

Asymptotic︸︷︷︸
Eventually

Equipartition︸︷︷︸
Equally Divided

Property

Informally

Asymptotic Equipartition Property (Informal)
As N →∞, log2 P(x1, . . . , xN) is close to −NH(X ) with high probability.

For large block sizes “almost all sequences are typical” (i.e., in TNβ)

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

4.4: Typicality 79

Figure 4.11. Anatomy of the typical set T . For p1 = 0.1 and N = 100 and N = 1000, these graphs
show n(r), the number of strings containing r 1s; the probability P (x) of a single string
that contains r 1s; the same probability on a log scale; and the total probability n(r)P (x) of
all strings that contain r 1s. The number r is on the horizontal axis. The plot of log2 P (x)
also shows by a dotted line the mean value of log2 P (x) = !NH2(p1), which equals !46.9
when N = 100 and !469 when N = 1000. The typical set includes only the strings that
have log2 P (x) close to this value. The range marked T shows the set TN! (as defined in
section 4.4) for N = 100 and ! = 0.29 (left) and N = 1000, ! = 0.09 (right).

N = 100 N = 1000

n(r) =
!
N
r

”

2e+28

4e+28

6e+28

8e+28

1e+29

1.2e+29

0 10 20 30 40 50 60 70 80 90 100
0

5e+298

1e+299

1.5e+299

2e+299

2.5e+299

3e+299

0 100 200 300 400 500 600 700 800 9001000

P (x) = pr1(1! p1)N!r

1e-05

2e-05

0 10 20 30 40 50 60 70 80 90 100

1e-05

2e-05

0 1 2 3 4 5

log2 P (x)

-350
-300
-250
-200
-150
-100
-50

0 10 20 30 40 50 60 70 80 90 100

-3500
-3000
-2500
-2000
-1500
-1000
-500

0 100 200 300 400 500 600 700 800 9001000

n(r)P (x) =
!
N
r

”
pr1(1! p1)N!r

0
0.02
0.04
0.06
0.08
0.1

0.12
0.14

0 10 20 30 40 50 60 70 80 90 100
0

0.005
0.01

0.015
0.02

0.025
0.03

0.035
0.04

0.045

0 100 200 300 400 500 600 700 800 9001000

r r

Probability sequence x has r heads for N = 100 (left) and N = 1000 (right). Here
P(X = head) = 0.1.

26 / 31

Asymptotic Equipartition Property
Formally

Asymptotic Equipartition Property
If x1, x2, . . . are i.i.d. with distribution P then, in probability,

− 1
N

log2 P(x1, . . . , xN)→ H(X ).

In precise language:

(∀β > 0) lim
N→∞

p
(∣∣∣∣− 1N log2 P(x1, . . . , xN)− H(X )

∣∣∣∣ < β ) = 1. Exactly the probability of x ∈ TNβ . Recall definition: for random variables v1, v2, . . ., we say vN → v in probability if for all β > 0 limN→∞ P(|vN − v | > β) = 0

Here vN corresponds to − 1N log2 P(x1, . . . , xN).

27 / 31

Asymptotic Equipartition Property
Formally

Asymptotic Equipartition Property
If x1, x2, . . . are i.i.d. with distribution P then, in probability,

− 1
N

log2 P(x1, . . . , xN)→ H(X ).

In precise language:

(∀β > 0) lim
N→∞

p
(∣∣∣∣− 1N log2 P(x1, . . . , xN)− H(X )

Here vN corresponds to − 1N log2 P(x1, . . . , xN).
27 / 31

Asymptotic Equipartition Property
Comments

Why is it surprising/significant?

For an ensemble with binary outcomes, and low entropy,

|TNβ| ≤ 2NH(X)+β � 2N

i.e. the typical set is a small fraction of all possible sequences

AEP says that for N sufficiently large, we are virtually guaranteed to draw
a sequence from this small set

Significance in information theory

28 / 31

Asymptotic Equipartition Property
Proof

Since x1, . . . , xN are independent,

− 1
N

log p(x1, . . . , xN) = −
1
n

log
N∏

n=1

p(xn)

= − 1
N

N∑
n=1

log p(xn).

Let Y = − log p(X ) and yn = − log p(xn). Then, yn ∼ Y , and

E[Y ] = H(X ).

But then by the law of large numbers,

(∀β > 0) lim
N→∞

(∣∣∣∣∣ 1N
N∑

n=1

yn − H(X )
∣∣∣∣∣ > β

)
= 0.

29 / 31

1 Ensembles and sequences
Counting Types of Sequences

2 Typical sets

3 Asymptotic Equipartition Property (AEP)

4 Wrapping Up

30 / 31

Summary & Conclusions

Ensembles and sequences

Typical sets

Asymptotic Equipartition Property (AEP)

Next: Source Coding.

31 / 31

Ensembles and sequences
Counting Types of Sequences

Typical sets
Asymptotic Equipartition Property (AEP)
Wrapping Up

Related Posts