代写代考 COMP2610 – Information Theory Lecture 12: The Source Coding Theorem

COMP2610 – Information Theory Lecture 12: The Source Coding Theorem
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

Copyright By PowCoder代写 加微信 powcoder

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
28 August 2018

Basic goal of compression
Key concepts: codes and their types, raw bit content, essential bit content Informal statement of source coding theorem

A General Communication Game (Recap)
Data compression is the process of replacing a message with a smaller message which can be reliably converted back to the original.
Want small messages on average when outcomes are from a fixed, known, but uncertain source (e.g., coin flips with known bias)
Heads, Tails, Heads, …
Heads, Tails, Heads, …

Definitions (Recap)
Source Code
Given an ensemble X, the function c : AX → B is a source code for X. The number of symbols in c(x) is the length l(x) of the codeword for x. The extension of c is defined by c(x1 …xn) = c(x1)…c(xn)

Definitions (Recap)
Source Code
Given an ensemble X, the function c : AX → B is a source code for X. The number of symbols in c(x) is the length l(x) of the codeword for x. The extension of c is defined by c(x1 …xn) = c(x1)…c(xn)
Smallest δ-sufficient subset
Let X be an ensemble and for δ ≥ 0 define Sδ to be the smallest subset of
AX such that
P(x ∈ Sδ) ≥ 1 − δ

Definitions (Recap)
Source Code
Given an ensemble X, the function c : AX → B is a source code for X. The number of symbols in c(x) is the length l(x) of the codeword for x. The extension of c is defined by c(x1 …xn) = c(x1)…c(xn)
Smallest δ-sufficient subset
Let X be an ensemble and for δ ≥ 0 define Sδ to be the smallest subset of
AX such that Essential Bit Content
P(x ∈ Sδ) ≥ 1 − δ
Let X be an ensemble then for δ ≥ 0 the essential bit content of X is
Hδ(X) = log2 |Sδ|

Essential Bit Content (Recap)
Intuitively, construct Sδ by removing elements of X in ascending order of probability, till we have reached the 1 − δ threshold
x P(x) a 1/4 b 1/4 c 1/4 d 3/16 e 1/64 f 1/64 g 1/64 h 1/64
Outcomes ranked (high–low) by P(x = ai) removed to make set Sδ with P(x ∈ Sδ) ≥ 1−δ
δ = 0 : Sδ = {a,b,c,d,e,f,g,h}

Essential Bit Content (Recap)
Intuitively, construct Sδ by removing elements of X in ascending order of probability, till we have reached the 1 − δ threshold
x P(x) a 1/4 b 1/4 c 1/4 d 3/16 e 1/64 f 1/64 g 1/64
Outcomes ranked (high–low) by P(x = ai) removed to make set Sδ with P(x ∈ Sδ) ≥ 1−δ
δ = 0 : Sδ = {a,b,c,d,e,f,g,h} δ = 1/64 : Sδ = {a,b,c,d,e,f,g}

Essential Bit Content (Recap)
Intuitively, construct Sδ by removing elements of X in ascending order of probability, till we have reached the 1 − δ threshold
Outcomes ranked (high–low) by P(x = ai) removed to make set Sδ with P(x ∈ Sδ) ≥ 1−δ
δ = 0 : Sδ = {a,b,c,d,e,f,g,h} δ = 1/64 : Sδ = {a,b,c,d,e,f,g}
δ = 1/16 : Sδ = {a,b,c,d}

Essential Bit Content (Recap)
Intuitively, construct Sδ by removing elements of X in ascending order of probability, till we have reached the 1 − δ threshold
x P(x) a 1/4
Outcomes ranked (high–low) by P(x = ai) removed to make set Sδ with P(x ∈ Sδ) ≥ 1−δ
δ = 0 : Sδ = {a,b,c,d,e,f,g,h} δ = 1/64 : Sδ = {a,b,c,d,e,f,g}
δ = 1/16 : Sδ = {a,b,c,d}
δ=3/4 :Sδ ={a}

Lossy Coding (Recap)
Consider a coin with P(Heads) = 0.9
If we are happy to fail on up to 2% of the sequences we can ignore any
sequence of 10 outcomes with more than 3 tails
There are only 176 < 28 sequences with 3 or fewer tails So, we can just code those, and ignore the rest! Coding 10 outcomes with 2% failure doable with 8 bits, or 0.8 bits/outcome Recap: typical sets Formal statement of source coding theorem Proof of source coding theorem The Source Coding Theorem (Theorem 4.1 in MacKay) Our aim this week is to understand this: The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 The Source Coding Theorem (Theorem 4.1 in MacKay) Our aim this week is to understand this: The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 In English: Given outcomes drawn from X . . . The Source Coding Theorem (Theorem 4.1 in MacKay) Our aim this week is to understand this: The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 Given outcomes drawn from X . . . . . . no matter what reliability 1 − δ and tolerance ε you choose . . . In English: The Source Coding Theorem (Theorem 4.1 in MacKay) Our aim this week is to understand this: The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 Given outcomes drawn from X . . . . . . no matter what reliability 1 − δ and tolerance ε you choose . . . . . . there is always a length N0 so sequences X N longer than this . . . In English: The Source Coding Theorem (Theorem 4.1 in MacKay) Our aim this week is to understand this: The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 Given outcomes drawn from X . . . . . . no matter what reliability 1 − δ and tolerance ε you choose . . . . . . there is always a length N0 so sequences X N longer than this . . . ... have an average essential bit content N1 Hδ(XN) within ε of H(X) In English: The Source Coding Theorem (Theorem 4.1 in MacKay) Our aim this week is to understand this: The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 Given outcomes drawn from X . . . . . . no matter what reliability 1 − δ and tolerance ε you choose . . . . . . there is always a length N0 so sequences X N longer than this . . . ... have an average essential bit content N1 Hδ(XN) within ε of H(X) In English: Hδ (X N ) measures the fewest number of bits needed to uniformly code smallest set of N-outcome sequence Sδ with P(x ∈ Sδ) ≥ 1 − δ. 1 Introduction Quick Review 2 Extended Ensembles Defintion and Properties Essential Bit Content The Asymptotic Equipartition Property 3 The Source Coding Theorem Typical Sets Statement of the Theorem Extended Ensembles (Review) Instead of coding single outcomes, we now consider coding blocks and sequences of blocks Example (Coin Flips): hhhhthhththh → hh hh th ht ht hh → hhh hth hth thh → hhhh thht hthh (6 × 2 outcome blocks) (4 × 3 outcome blocks) (3 × 4 outcome blocks) Extended Ensembles (Review) Instead of coding single outcomes, we now consider coding blocks and sequences of blocks Example (Coin Flips): hhhhthhththh → hh hh th ht ht hh → hhh hth hth thh → hhhh thht hthh Extended Ensemble (6 × 2 outcome blocks) (4 × 3 outcome blocks) (3 × 4 outcome blocks) The extended ensemble of blocks of size N is denoted X N . Outcomes from XN are denoted x = (x1,x2,...,xN). The probability of x is defined to be P(x) = P(x1)P(x2) . . . P(xN ). Extended Ensembles (Review) Instead of coding single outcomes, we now consider coding blocks and sequences of blocks Example (Coin Flips): hhhhthhththh → hh hh th ht ht hh → hhh hth hth thh → hhhh thht hthh Extended Ensemble (6 × 2 outcome blocks) (4 × 3 outcome blocks) (3 × 4 outcome blocks) The extended ensemble of blocks of size N is denoted X N . Outcomes from XN are denoted x = (x1,x2,...,xN). The probability of x is defined to be P(x) = P(x1)P(x2) . . . P(xN ). What is the entropy of XN? Extended Ensembles (Review) Example: Bent Coin Let X be an ensemble with outcomes AX ={h,t}withph =0.9andpt =0.1. Consider X 4 – i.e., 4 flips of the coin. AX4 = {hhhh,hhht,hhth,...,tttt} Extended Ensembles (Review) Example: Bent Coin Let X be an ensemble with outcomes AX ={h,t}withph =0.9andpt =0.1. Consider X 4 – i.e., 4 flips of the coin. AX4 = {hhhh,hhht,hhth,...,tttt} What is the probability of Four heads? P(hhhh) = (0.9)4 ≈ 0.656 Four tails? P(tttt) = (0.1)4 = 0.0001 Extended Ensembles (Review) Example: Bent Coin Let X be an ensemble with outcomes AX ={h,t}withph =0.9andpt =0.1. Consider X 4 – i.e., 4 flips of the coin. AX4 = {hhhh,hhht,hhth,...,tttt} What is the probability of Four heads? P(hhhh) = (0.9)4 ≈ 0.656 Four tails? P(tttt) = (0.1)4 = 0.0001 What is the entropy and raw bit content of X4? The outcome set size is |AX 4 | = |{0000, 0001, 0010, . . . , 1111}| = 16 Raw bit content: H0(X4) = log2 |AX4| = 4 Entropy: H (X 4 ) = 4H (X ) = 4. (−0.9 log2 0.9 − 0.1 log2 0.1) = 1.88 Essential Bit Content of Extended Ensembles What if we use a lossy uniform code on the extended ensemble? x P(x) x P(x) hhhh 0.656 thht 0.008 hhht 0.073 thth 0.008 hhth 0.073 tthh 0.008 hthh 0.073 httt 0.001 thhh 0.073 thtt 0.001 htht 0.008 ttht 0.001 htth 0.008 ttth 0.001 hhtt 0.008 tttt 0.000 δ = 0 gives Hδ 􏰀X4􏰁 = log2 16 = 4 Essential Bit Content of Extended Ensembles What if we use a lossy uniform code on the extended ensemble? x P(x) hhhh 0.656 hhht 0.073 hhth 0.073 hthh 0.073 thhh 0.073 htht 0.008 htth 0.008 hhtt 0.008 δ = 0.0001 gives Hδ 􏰀X4􏰁 = log2 15 = 3.91 x P(x) thht 0.008 thth 0.008 tthh 0.008 httt 0.001 thtt 0.001 ttht 0.001 ttth 0.001 Essential Bit Content of Extended Ensembles What if we use a lossy uniform code on the extended ensemble? x P(x) hhhh 0.656 hhht 0.073 hhth 0.073 hthh 0.073 thhh 0.073 htht 0.008 htth 0.008 hhtt 0.008 δ = 0.005 gives Hδ 􏰀X4􏰁 = log2 11 = 3.46 x P(x) thht 0.008 thth 0.008 tthh 0.008 Essential Bit Content of Extended Ensembles What if we use a lossy uniform code on the extended ensemble? x P(x) x P(x) hhhh 0.656 hhht 0.073 hhth 0.073 hthh 0.073 thhh 0.073 δ = 0.05 gives Hδ 􏰀X4􏰁 = log2 5 = 2.32 Essential Bit Content of Extended Ensembles What if we use a lossy uniform code on the extended ensemble? x P(x) hhhh 0.656 hhht 0.073 hhth 0.073 δ = 0.25 gives Hδ 􏰀X4􏰁 = log2 3 = 1.6 Essential Bit Content of Extended Ensembles What if we use a lossy uniform code on the extended ensemble? x P(x) hhhh 0.656 hhht 0.073 hhth 0.073 δ = 0.25 gives Hδ 􏰀X4􏰁 = log2 3 = 1.6 Unlike entropy, Hδ(X4) ̸= 4Hδ(X) = 0 Essential Bit Content of Extended Ensembles What happens as N increases?  Hà(XN)0.8 N 0.6 0.4 0.2 00 0.2 0.4 0.6 0.8 1δ Recall that the entropy of a single coin flip with ph = 0.9 is H(X) ≈ 0.47 N=210 N=410 N=610 N=810 Essential Bit Content of Extended Ensembles Some Intuition Why does the curve flatten for large N? Recall that for N = 1000 e.g., sequences with 900 heads are considered Such sequences occupy most of the probability mass, and are roughly equally likely As we increase δ, we will quickly encounter these sequences, and make small, roughly equal sized changes to |Sδ| Typical Sets and the AEP (Review) Typical Sets and the AEP (Review) Typical Set For “closeness” β > 0 the typical set TNβ for XN is def􏱉􏰇􏰇􏰇1 􏰇􏰇􏰇 􏱊
TNβ= x:􏰇−Nlog2P(x)−H(X)􏰇<β The name “typical” is used since x ∈ TNβ will have roughly p1N occurences of symbol a1, p2N of a2, ..., pKN of aK. Typical Sets and the AEP (Review) Typical Set For “closeness” β > 0 the typical set TNβ for XN is def􏱉􏰇􏰇􏰇1 􏰇􏰇􏰇 􏱊
TNβ= x:􏰇−Nlog2P(x)−H(X)􏰇<β The name “typical” is used since x ∈ TNβ will have roughly p1N occurences of symbol a1, p2N of a2, ..., pKN of aK. Asymptotic Equipartition Property (Informal) As N → ∞, log2 P(x1,...,xN) is close to −NH(X) with high probability. For large block sizes “almost all sequences are typical” (i.e., in TNβ). 1 Introduction Quick Review 2 Extended Ensembles Defintion and Properties Essential Bit Content The Asymptotic Equipartition Property 3 The Source Coding Theorem Typical Sets Statement of the Theorem The Source Coding Theorem The Source Coding Theorem Let X be an ensemble with entropy H = H(X) bits. Given ε > 0 and 0 < δ < 1, there exists a positive integer N0 such that for all N > N0
􏰇􏰇􏰇1H 􏰀XN􏰁−H􏰇􏰇􏰇<ε. 􏰇Nδ 􏰇 1 ) 0.8 0.6 0.4 0.2 0 N=210 N=410 N=610 N=810 Given a tiny probability of error δ, the average bits per outcome can be made as close to H as required. Even if we allow a large probability of error, we cannot compress more than H bits per outcome for large sequences. 0 0.2 0.4 0.6 0.8 1δ Warning: proof ahead I don’t expect you to reproduce the following proof I present it as it sheds some light on why the result is true And it is a remarkable and fundamental result You are expected to understand and be able to apply the theorem Proof of the SCT The absolute value of a difference being bounded (e.g., |x − y | ≤ ε) says two things: 1 When x − y is positive, it says x − y < ε which means x < y + ε 2 When x − y is negative, it says −(x − y) < ε which means x < y − ε |x−y|<ε isequivalentto y−εH−εNβ
δà We show how small Hδ X must be by calculating
ssibly beà We are free to set β to any convenient valueà probability that a member of TNβ can have is 2−N H+β , lity contained by TNβ canÕt be any bigger than à So
|TNβ|2−N H+β < , 4à36 à  δ Figure 4.13. Schematic illustration Proof the SCT Idea Proof Idea: As N increases TNβ has ∼ 2NH(X) elements almost all x are in TNβ Sδ and TNβ increasingly overlap so log2 |Sδ| ∼ NH Basically, we look to encode all typical sequences uniformly, and relate that to the essential bit content Proof of the SCT (Part 1) For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) < H(X) + ε. Proof of the SCT (Part 1) For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) < H(X) + ε. Recall (see Lecture 10) for the typical set TNβ we have for any N,β that |TNβ| ≤ 2N(H(X)+β) (1) and, by the AEP, for any β as N → ∞ we have P(x ∈ TNβ) → 1. So for any δ > 0 we can always find an N such that P(x ∈ TNβ) ≥ 1 − δ.

Proof of the SCT (Part 1)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) < H(X) + ε. Recall (see Lecture 10) for the typical set TNβ we have for any N,β that |TNβ| ≤ 2N(H(X)+β) (1) and, by the AEP, for any β as N → ∞ we have P(x ∈ TNβ) → 1. So for any δ > 0 we can always find an N such that P(x ∈ TNβ) ≥ 1 − δ.
Now recall the definition of the smallest δ-sufficient subset Sδ: it is the smallest subset of outcomes such that P(x ∈ Sδ) ≥ 1 − δ so |Sδ| ≤ |TNβ|.

Proof of the SCT (Part 1)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) < H(X) + ε. Recall (see Lecture 10) for the typical set TNβ we have for any N,β that |TNβ| ≤ 2N(H(X)+β) (1) and, by the AEP, for any β as N → ∞ we have P(x ∈ TNβ) → 1. So for any δ > 0 we can always find an N such that P(x ∈ TNβ) ≥ 1 − δ.
Now recall the definition of the smallest δ-sufficient subset Sδ: it is the smallest subset of outcomes such that P(x ∈ Sδ) ≥ 1 − δ so |Sδ| ≤ |TNβ|.
So, given any δ and β we can find an N large enough so that, by (1) |Sδ| ≤ |TNβ| ≤ 2N(H(X)+β)

Proof of the SCT (Part 1)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) < H(X) + ε. Recall (see Lecture 10) for the typical set TNβ we have for any N,β that |TNβ| ≤ 2N(H(X)+β) (1) and, by the AEP, for any β as N → ∞ we have P(x ∈ TNβ) → 1. So for any δ > 0 we can always find an N such that P(x ∈ TNβ) ≥ 1 − δ.
Now recall the definition of the smallest δ-sufficient subset Sδ: it is the smallest subset of outcomes such that P(x ∈ Sδ) ≥ 1 − δ so |Sδ| ≤ |TNβ|.
So, given any δ and β we can find an N large enough so that, by (1) log2 |Sδ| ≤ log2 |TNβ| ≤ N(H(X) + β)

Proof of the SCT (Part 1)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) < H(X) + ε. Recall (see Lecture 10) for the typical set TNβ we have for any N,β that |TNβ| ≤ 2N(H(X)+β) (1) and, by the AEP, for any β as N → ∞ we have P(x ∈ TNβ) → 1. So for any δ > 0 we can always find an N such that P(x ∈ TNβ) ≥ 1 − δ.
Now recall the definition of the smallest δ-sufficient subset Sδ: it is the smallest subset of outcomes such that P(x ∈ Sδ) ≥ 1 − δ so |Sδ| ≤ |TNβ|.
So, given any δ and β we can find an N large enough so that, by (1) Hδ(XN) = log2 |Sδ| ≤ log2 |TNβ| ≤ N(H(X) + β)
Setting β = ε and dividing through by N gives result.

Proof of the SCT (Part 2)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) > H(X) − ε.
Suppose this was not the case – that is, for every N we have N1 Hδ(XN) ≤ H(X) − ε ⇐⇒ |Sδ| ≤ 2N(H(X)−ε)

Proof of the SCT (Part 2)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) > H(X) − ε.
Suppose this was not the case – that is, for every N we have N1 Hδ(XN) ≤ H(X) − ε ⇐⇒ |Sδ| ≤ 2N(H(X)−ε)
Let’s look at what this says about P(x ∈ Sδ) by writing
P(x ∈Sδ)=P(x ∈Sδ ∩TNβ)+P(x ∈Sδ ∩TNβ)
≤ |Sδ|2−N(H−β) + P(x ∈ TNβ)
since every x ∈ TNβ has P(x) ≤ 2−N(H−β) and Sδ ∩ TNβ ⊂ TNβ.

Proof of the SCT (Part 2)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) > H(X) − ε.
Suppose this was not the case – that is, for every N we have N1 Hδ(XN) ≤ H(X) − ε ⇐⇒ |Sδ| ≤ 2N(H(X)−ε)
Let’s look at what this says about P(x ∈ Sδ) by writing
P(x ∈Sδ)=P(x ∈Sδ ∩TNβ)+P(x ∈Sδ ∩TNβ)
≤ |Sδ|2−N(H−β) + P(x ∈ TNβ)
since every x ∈ TNβ has P(x) ≤ 2−N(H−β) and Sδ ∩ TNβ ⊂ TNβ. So
P(x ∈ Sδ) ≤ 2N(H−ε)2−N(H−β) + P(x ∈ TNβ)

Proof of the SCT (Part 2)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) > H(X) − ε.
Suppose this was not the case – that is, for every N we have N1 Hδ(XN) ≤ H(X) − ε ⇐⇒ |Sδ| ≤ 2N(H(X)−ε)
Let’s look at what this says about P(x ∈ Sδ) by writing
P(x ∈Sδ)=P(x ∈Sδ ∩TNβ)+P(x ∈Sδ ∩TNβ)
≤ |Sδ|2−N(H−β) + P(x ∈ TNβ)
since every x ∈ TNβ has P(x) ≤ 2−N(H−β) and Sδ ∩ TNβ ⊂ TNβ. So
P(x ∈ Sδ) ≤ 2−N(H−H+ε−β) + P(x ∈ TNβ)

Proof of the SCT (Part 2)
For ε > 0 and δ > 0, want N large enough so N1 Hδ(XN) > H(X) − ε.
Suppose this was not the cas

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com