COMP2610 / COMP6261 – Information Theory Lecture 9: Probabilistic Inequalities
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:
Copyright By PowCoder代写 加微信 powcoder
earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
20 August, 2018
Mutual information chain rule Jensen’s inequality “Information cannot hurt” Data processing inequality
Review: Data-Processing Inequality
ifX →Y →Z then: I(X;Y)≥I(X;Z)
X is the state of the world, Y is the data gathered and Z is the
processed data
No “clever” manipulation of the data can improve the inferences that can be made from the data
No processing of Y , deterministic or random, can increase the information that Y contains about X
Markov’s inequality
Chebyshev’s inequality
Law of large numbers
1 Properties of expectation and variance
2 Markov’s inequality
3 Chebyshev’s inequality
4 Law of large numbers
5 Wrapping Up
1 Properties of expectation and variance
2 Markov’s inequality
3 Chebyshev’s inequality
4 Law of large numbers
5 Wrapping Up
Expectation and Variance
Let X be a random variable over X , with probability distribution p Expected value:
E[X] = x · p(x). x∈X
V[X] = E[(X − E[X])2] = E[X2] − (E[X])2.
Standard deviation is V[X]
Properties of expectation
A key property of expectations is linearity:
E Xi =E[Xi]
LHS = …
p(x1,…,xn)·xi i=1
x1∈X1 xn∈Xn
This holds even if the variables are dependent!
We have for any a ∈ R,
E[aX] = a · E[X].
Properties of variance
We have linearity of variance for independent random variables:
V Xi =V[Xi].
i=1 i=1 Does not hold if the variables are dependent
(prove this: expand the definition of variance and rely upon E(Xi Xj ) = E(Xi )E(Xj ) whenXi ⊥Xj)
We have for any a ∈ R,
V[aX] = a2 · V[X].
1 Properties of expectation and variance
2 Markov’s inequality
3 Chebyshev’s inequality
4 Law of large numbers
5 Wrapping Up
Markov’s Inequality Motivation
1000 school students sit an examination
The busy principal is only told that the average score is 40 (out of 100).
The principal wants to estimate the maximum possible number of students who scored more than 80
A question about the minimum number of students is trivial to answer. Why?
Markov’s Inequality Motivation
Call x the number of students who score > 80
Call S is the total score of students who score ≤ 80
40 · 1000 − S = {total score of students who score above 80} > 80x
Exam scores are nonnegative, so certainly S ≥ 0 Thus, 80x < 40 · 1000, or
x < 500. Can we formalise this more generally?
Markov’s Inequality
Let X be a nonnegative random variable. Then, for any λ > 0, p(X ≥ λ) ≤ E[X].
Bounds probability of observing a large outcome
Vacuous if λ < E[X]
Markov’s Inequality Alternate Statement
Let X be a nonnegative random variable. Then, for any λ > 0, p(X ≥λ·E[X])≤ λ1.
Observations of nonnegative random variable unlikely to be much larger than expected value
Vacuous if λ < 1
Markov’s Inequality Proof
E[X] = x · p(x) x∈X
= x · p(x) + x · p(x) x<λ x≥λ
≥ x · p(x ) nonneg. of random variable x≥λ
≥ λ · p(x) x≥λ
=λ·p(X ≥λ).
Markov’s Inequality
Illustration from
https://justindomke.wordpress.com/2008/06/19/markovs-inequality/
Markov’s Inequality
Illustration from
https://justindomke.wordpress.com/2008/06/19/markovs-inequality/
Markov’s Inequality
Illustration from
https://justindomke.wordpress.com/2008/06/19/markovs-inequality/
Markov’s Inequality
Illustration from
https://justindomke.wordpress.com/2008/06/19/markovs-inequality/
1 Properties of expectation and variance
2 Markov’s inequality
3 Chebyshev’s inequality
4 Law of large numbers
5 Wrapping Up
Chebyshev’s Inequality Motivation
Markov’s inequality only uses the mean of the distribution What about the spread of the distribution (variance)?
0 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20
Chebyshev’s Inequality
Let X be a random variable with E[X] < ∞. Then, for any λ > 0, p(|X − E[X]| ≥ λ) ≤ V[X].
Bounds the probability of observing an “unexpected” outcome
Does not require non negativity Two-sided bound
Chebyshev’s Inequality Alternate Statement
Let X be a random variable with E[X] < ∞. Then, for any λ > 0, p(|X −E[X]|≥λ·V[X])≤ 1 .
Observations are unlikely to occur several standard deviations away from the mean
Chebyshev’s Inequality Proof
Then, by Markov’s inequality, for any ν > 0,
Thus, setting λ =
p(Y ≥ ν) ≤ E[Y]. ν
E[Y] = V[X].
Y≥ν⇐⇒|X−E[X]|≥ ν.
Y = (X − E[X])2.
p(|X − E[X]| ≥ λ) ≤ V[X]. λ2
Chebyshev’s Inequality Illustration
For a binomial X with N trials and success probability θ, we have e.g. p(|X − Nθ| ≥ 2Nθ(1 − θ)) ≤ 12.
Chebyshev’s Inequality Example
Suppose we have a coin with bias θ, i.e. p(X = 1) = θ
Say we flip the coin n times, and observe x1,…,xn ∈ {0,1}
We use the maximum likelihood estimator of θ: θˆ n = x 1 + . . . + x n
n Estimate how large n should be such that
p(|θˆn − θ| ≥ 0.05) ≤ 0.01? 1% probability of a 5% error
(Aside: the need for two parameters here is generic: “Probabably Approximately Correct”)
Chebyshev’s Inequality Example
Observe that
ni = 1 E [ x i ]
θ ( 1 − θ )
ni = 1 V [ x i ]
V[θn]= n2 = n .
Thus, applying Chebyshev’s inequality to θˆn,
p(|θˆn−θ|>0.05)≤ θ(1−θ). (0.05)2 · n
We are guaranteed this is less than 0.01 if
n≥ θ(1−θ) . (0.05)2 (0.01)
When θ = 0.5, n ≥ 10, 000 (!)
1 Properties of expectation and variance
2 Markov’s inequality
3 Chebyshev’s inequality
4 Law of large numbers
5 Wrapping Up
Independent and Identically Distributed
Let X1, . . . , Xn be random variables such that: Each Xi is independent of Xj
The distribution of Xi is the same as that of Xj
Then, we say that X1, . . . , Xn are independent and identically distributed
Example: For n independent flips of an unbiased coin, X1, . . . , Xn are iid
from Bernoulli 1 2
Law of Large Numbers
Let X1, . . . , Xn be a sequence of iid random variables, with E[Xi] = μ
and V[Xi ] < ∞. Define
Then, for any ε > 0,
X ̄ n = X 1 + . . . + X n . n
lim p(|X ̄n −μ|>ε)=0. n→∞
Given enough trials, the empirical “success frequency” will be close to the expected value
Law of Large Numbers Proof
Since the Xi ’s are identically distributed, E[X ̄n] = μ.
Since the Xi ’s are independent,
̄ X1 +…+Xn
= V[X1 +…+Xn] n2
V[Xn] = V n
Law of Large Numbers Proof
Applying Chebyshev’s inequality to X ̄n,
p(|X ̄n − μ| ≥ ε) ≤ V[X ̄n]
For any fixed ε > 0, as n → ∞, the right hand side → 0. Thus,
p(|X ̄n −μ|<ε)→1.
Law of Large Numbers Illustration
N = 1000 trials with Bernoulli random variable with parameter 12 0.7
0 100 200 300
400 500 600 700
800 900 1000
Law of Large Numbers Illustration
N = 50000 trials with Bernoulli random variable with parameter 12 0.7
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1 Properties of expectation and variance
2 Markov’s inequality
3 Chebyshev’s inequality
4 Law of large numbers
5 Wrapping Up
Summary & Conclusions
Markov’s inequality
Chebyshev’s inequality
Law of large numbers
Ensembles and sequences
Typical sets
Approximation Equipartition (AEP)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com