编程辅导 COMP2610 / COMP6261 – Information Theory Lecture 10: Typicality and Asympto

COMP2610 / COMP6261 – Information Theory Lecture 10: Typicality and Asymptotic Equipartition Property
U Logo Use Guidelines Robert C. Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

Copyright By PowCoder代写 加微信 powcoder

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
21 August, 2018

Markov’s inequality Chebyshev’s inequality Law of large numbers

Law of Large Numbers
Let X1, . . . , Xn be a sequence of iid random variables, with E[Xi] = μ
and V[Xi ] < ∞. Define Then, for any β > 0,
X ̄ n = X 1 + . . . + X n . n
lim p(|X ̄n −μ|<β)=1. n→∞ This is also called X ̄n → μ in probability. Definition: For random variables v1, v2, . . ., we say vn → v in probability if for all β > 0 limn→∞ P(|vn − v| > β) = 0.
β is fixed (not shrinking like n1 ). Not max/min. Reduction in variability.

Ensembles and sequences
Typical sets
Asymptotic Equipartition Property (AEP)

1 Ensembles and sequences Counting Types of Sequences
2 Typical sets
3 Asymptotic Equipartition Property (AEP)
4 Wrapping Up

An ensemble X is a triple (x , AX , PX ); x is a random variable taking values in AX = {a1,a2,…,aI} with probabilities PX = {p1,p2,…,pI}.
We will call AX the alphabet of the ensemble

Ensembles Example: Bent Coin
Let X be an ensemble with outcomes h for heads with probability 0.9 and t for tails with probability 0.1.
TheoutcomesetisAX ={h,t} The probabilities are
PX = {ph = 0.9, pt = 0.1}

Extended Ensembles
We can also consider blocks of outcomes, which will be useful to describe
sequences:
Example (Coin Flips):
hhhhthhththh → hh hh th ht ht hh → hhh hth hth thh
→ hhhh thht hthh
(6 × 2 outcome blocks) (4 × 3 outcome blocks) (3 × 4 outcome blocks)

Extended Ensembles
We can also consider blocks of outcomes, which will be useful to describe
sequences:
Example (Coin Flips):
hhhhthhththh → hh hh th ht ht hh → hhh hth hth thh
→ hhhh thht hthh
Extended Ensemble
(6 × 2 outcome blocks) (4 × 3 outcome blocks) (3 × 4 outcome blocks)
Let X be a single ensemble. The extended ensemble of blocks of size N is denoted XN. Outcomes from XN are denoted x = (x1,x2,…,xN). The probability of x is defined to be P(x) = P(x1)P(x2) . . . P(xN ).

Extended Ensembles Example: Bent Coin
Let X be an ensemble with outcomes AX ={h,t}withph =0.9andpt =0.1.
Consider X 4 – i.e., 4 flips of the coin. AX4 = {hhhh,hhht,hhth,…,tttt}

Extended Ensembles Example: Bent Coin
Let X be an ensemble with outcomes AX ={h,t}withph =0.9andpt =0.1.
Consider X 4 – i.e., 4 flips of the coin. AX4 = {hhhh,hhht,hhth,…,tttt}
P(hhhh) = (0.9)4 ≈ 0.6561
P(tttt) = (0.1)4 = 0.0001
P(hthh) = 0.9 · 0.1 · 0.9 · 0.9 = (0.9)3(0.1) ≈ 0.0729 P(htht) = 0.9 · 0.1 · 0.9 · 0.1 = (0.9)2(0.1)2 ≈ 0.0081.

Extended Ensembles Example: Bent Coin
Entropy of extended ensembles
We can view X4 as comprising 4 independent random variables, based on the ensemble X
Entropy is additive for independent random variables Thus,
H (X 4 ) = 4H (X ) = 4. (−0.9 log2 0.9 − 0.1 log2 0.1) = 1.88bits. More generally,
H(XN) = NH(X).

Counting Types of Sequences
Criteria for dividing 2N sequences into types In the bent coin example,
(0.9)2(0.1)2 = P(hhtt) = P(htht) = P(htth) = P(thht) = P(thth)
= P(tthh). The order of outcomes in the sequence is irrelevant

Counting Types of Sequences
Let X be an ensemble with alphabet AX = {a1,…,aI}.
Letp(X =ai)=pi.
For a sequence x = x1,x2,…,xN, how to compute p(x)?
let ni = # of times symbol ai appears in x (symbol count)
Given the ni ’s, we can compute the probability of seeing x:
P(x) = P(x1) · P(x2) · . . . · P(xN )
=P(a1)n1 ·P(a2)n2 ·…·P(aI)nI
=pn1 ·pn2 …pnI 12I
Sufficient statistics: {n1, n2, . . . , nI }. Use it as a criteria of partitioning.

Counting Types of Sequences Sequence Types
Each unique choice of (n1 , n2 , . . . , nI ) gives a different type of sequence 4 heads, (3 heads, 1 tail), (2 heads, 2 tails), …
Sequences in each type are equiprobable.
For a given type of sequence how many sequences are there with these symbol counts?
= N! n1!n2!…nI!
nnn… 123
N! · (N − n1)! · n1!(N −n1)! n2!(N −n1 −n2)!
(N − n1 − n2)! n3!(N −n1 −n2 −n3)!
# of sequences with ni copies of ai 􏰂N􏰃􏰂N −n1􏰃􏰂N −n1 −n2􏰃

Counting Types of Sequences Example
Probability of types
Let A = {a,b,c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.

Counting Types of Sequences Example
Probability of types
Let A = {a,b,c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.
Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability (0.2)2(0.3)1(0.5)3 = 0.0015.

Counting Types of Sequences Example
Probability of types
Let A = {a,b,c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.
Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability (0.2)2(0.3)1(0.5)3 = 0.0015.
There are 6! = 60 such sequences. 2!1!3!

Counting Types of Sequences Example
Probability of types
Let A = {a,b,c} with P(a) = 0.2, P(b) = 0.3, P(c) = 0.5.
Each sequence of type (na, nb, nc) = (2, 1, 3) has length 6 and probability (0.2)2(0.3)1(0.5)3 = 0.0015.
There are 6! = 60 such sequences. 2!1!3!
The probability x is of type (2, 1, 3) is (0.0015) · 60 = 0.09.
Study probabilities at the level of types (most likely, average/typical)

1 Ensembles and sequences Counting Types of Sequences
2 Typical sets
3 Asymptotic Equipartition Property (AEP)
4 Wrapping Up

Extended Ensembles Example
With ph = 0.75, what are the probabilities for X N ? N=2
x P(x) hh 0.5625 ht 0.1875 th 0.1875 tt 0.0625

Extended Ensembles Example
With ph = 0.75, what are the probabilities for X N ? N=2 N=3
x P(x) hh 0.5625 ht 0.1875 th 0.1875 tt 0.0625
x P(x) hhh 0.4219 hht 0.1406 hth 0.1406 thh 0.1406 htt 0.0469 tht 0.0469 tth 0.0469 ttt 0.0156

Extended Ensembles Example
With ph = 0.75, what are the probabilities for X N ?
N=2 N=3 N=4
x P(x) x P(x) x P(x) x P(x)
hh 0.5625 ht 0.1875 th 0.1875 tt 0.0625
hhh 0.4219 hhhh hht 0.1406 hhht hth 0.1406 hhth thh 0.1406 hthh htt 0.0469 thhh tht 0.0469 htht tth 0.0469 htth ttt 0.0156 hhtt
0.3164 thht 0.1055 thth 0.1055 tthh 0.1055 httt 0.1055 thtt 0.0352 ttht 0.0352 ttth 0.0352 tttt
0.0352 0.0352 0.0352 0.0117 0.0117 0.0117 0.0117 0.0039

Observations
As N increases, there is an increasing spread of probabilities
The most likely single sequence will always be the all h’s However, for N = 4, the most likely sequence type is 3 h’s and 1 t Not surprising because 3 = N · ph, pretty much average case.

Symbol Frequency in Long Sequences
To judge if a sequence is typical/average, a natural question to ask is:
How often does each symbol appear in a sequence x from X N ? Intuitively, in a sequence of length N, let ai appear for ni times.
Then in expectation
Notepi =P(ai),and
ni ≈N·P(ai)
P(x) = P(a1)n1P(a2)n2 …P(aI)nI ≈ pNp1pNp2 …pNpI 12I

Symbol Frequency in Long Sequences
To judge if a sequence is typical/average, a natural question to ask is:
How often does each symbol appear in a sequence x from X N ? Intuitively, in a sequence of length N, let ai appear for ni times.
Then in expectation
Notepi =P(ai),and
ni ≈N·P(ai)
P(x) = P(a1)n1P(a2)n2 …P(aI)nI ≈ pNp1pNp2 …pNpI
So the information content − log2 P(x) of that sequence is approximately
−p1Nlog2p1 −…−pINlog2pI =−N􏰄pi log2pi =NH(X)

Typical Sets
We want to consider elements x that have −log2 P(x) “close” to NH(X) Typical Set
For “closeness” β > 0 the typical set TNβ for XN is def
Union of types
TNβ ={x:|−log2P(x)−NH(X)| 0 the typical set TNβ for XN is
TNβ ={x:|−log2P(x)−NH(X)| 0?
Total Probability

1 Ensembles and sequences Counting Types of Sequences
2 Typical sets
3 Asymptotic Equipartition Property (AEP)
4 Wrapping Up

Asymptotic Equipartition Property
P(x) = pr(1 − p )N!r 2e-05
Eventually Informally
Asymptotic Equipartition Property (Informal)
As N → ∞, log P(x ,…,x ) is close to −NH(X) with high probability. 2 log21P(x) N T
Equally Divided
n(r)P (x) =
􏰀N􏰁 r N!r r p(1 − p)
0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 00
0 10 20 30 40 50 60 70 80 90 100
1e-05 012345
0 102030405060708090100
-150 -1500
-200 -2000
-250 -2500
For large block sizes “almost all sequences are typical” (i.e., in TNβ)
0.14 0.12 0.1 0.08 0.06 0.04 0.02
0 10 20 30 40 50 60 70 80 90 100
-3000 -3500
0 1002003004005006007008009001000
Probability sequence x has r heads for N = 100 (left) and N = 1000 (right). Here Figure 4.11. Anatomy of the typical set T . For p = 0.1 and N = 100 and N = 1000, these gr
P(X = head) = 0.1.
show n(r), the number of strings containing r 1s; the probability P(x) of a single st
that contains r 1s; the same probability on a log scale; and the total probability n(r)P (
all strings that contain r 1s. The number r is on the horizontal axis. The plot of log
-100 -1000 T
0 1002003004005006007008009001000

Asymptotic Equipartition Property Formally
Asymptotic Equipartition Property
If x1, x2, . . . are i.i.d. with distribution P then, in probability, −N1 log2 P(x1,…,xN) → H(X).
In precise language:
(∀β>0)limp − logP(x,…,x)−H(X)<β =1. 􏰂􏰇􏰇 1 􏰇􏰇 􏰃 N→∞􏰇􏰇N21N 􏰇􏰇 Exactly the probability of x ∈ TNβ. Asymptotic Equipartition Property Formally Asymptotic Equipartition Property If x1, x2, . . . are i.i.d. with distribution P then, in probability, −N1 log2 P(x1,...,xN) → H(X). In precise language: (∀β>0)limp − logP(x,…,x)−H(X)<β =1. 􏰂􏰇􏰇 1 􏰇􏰇 􏰃 N→∞􏰇􏰇N21N 􏰇􏰇 Exactly the probability of x ∈ TNβ. Recall definition: for random variables v1, v2, . . ., we say vN → v in probabilityifforallβ>0limN→∞P(|vN −v|>β)=0 Here vN corresponds to −N1 log2 P(x1,…,xN).

Asymptotic Equipartition Property Comments
Why is it surprising/significant?
For an ensemble with binary outcomes, and low entropy, |TNβ| ≤ 2NH(X)+β ≪ 2N
i.e. the typical set is a small fraction of all possible sequences
AEP says that for N sufficiently large, we are virtually guaranteed to draw
a sequence from this small set Significance in information theory

Asymptotic Equipartition Property Proof
Sincex1,…,xN areindependent,
−N logp(x1,…,xN) = −n log 1 􏰄N
p(xn) logp(xn).
Let Y = −logp(X) and yn = −logp(xn). Then, yn ∼ Y, and
E[Y] = H(X). But then by the law of large numbers,
􏰎􏰇􏰇1􏰄N (∀β>0)limp 􏰇􏰇
N→∞ 􏰇N n=1
yn−H(X)􏰇􏰇>β =0. 􏰇

1 Ensembles and sequences Counting Types of Sequences
2 Typical sets
3 Asymptotic Equipartition Property (AEP)
4 Wrapping Up

Summary & Conclusions
Ensembles and sequences
Typical sets
Asymptotic Equipartition Property (AEP) Next: Source Coding.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com