CS代写 COMP2610/COMP6261 Tutorial 5 Sample Solutions Tutorial 5: Probabilistic ine

COMP2610/COMP6261 Tutorial 5 Sample Solutions Tutorial 5: Probabilistic inequalities and Mutual Information
Young Lee and
Tutors: and
Week 5 (21st – 25th August), Semester 2, 2017

Copyright By PowCoder代写 加微信 powcoder

1. Consider a discrete variable X taking on values from the set X . Let pi be the probability of each state, with
i = 1, . . . , |X |. Denote the vector of probabilities by p. We saw in lectures that the entropy of X satisfies: H(X) ≤ log|X |,
1 for all i, i.e. p is uniform. Prove the above statement using Gibbs’ inequality, |X|
for any probability distributions p, q over |X | outcomes, with equality if and only if p = q.
Gibb’s inequality tells us that for any two probability vectors p = (p1, . . . , p|X |) and q = (q1, . . . , q|X |): |X|
􏰄pilogpi ≥0 i=1 qi
with equality if and only if pi = which says
􏰄p log pi ≥0
with equality if and only if p = q. If we take q to be the vector representing the uniform distribution q1 =…=q|X| = 1 ,thenweget
0 ≤ 􏰄pi log pi = 􏰄pi logpi +􏰄pi log|X| = −H(p)+log|X|
|X| |X| |X|
i=1 |X| i=1 i=1
with equality if and only if p is the uniform distribution. Moving H(p) to the other side gives the inequality.

2. Let X be a discrete random variable. Show that the entropy of a function of X is less than or equal to the entropy of X by justifying the following steps:
Thus H(g(X)) ≤ H(X). Solution.
H(X, g(X)) = H(X) + H(g(X)|X)
H(X, g(X)) = H(g(X)) + H(X|g(X))
≥ H(g(X)).
(a) This is using the chain rule of entropy,
i.e. H(X,Y)=H(X)+H(Y |X)whereY =g(X)
(b) Given X, we can determine g(X) since it is fixed, being a function of X. This means no uncertainty remains about g(X) when X is given. Thus, H(g(X) | X) = 0 since 􏰐x p(x)p(g(X) | X = x) = 0.
(c) This is also using the chain rule of entropy,
i.e. H(X,Y)=H(Y)+H(X |Y)whereY =g(X)
(d) In this case, H(X | g(X)) ≥ 0 since the conditional entropy of a discrete random variable is non-negative. If g(X) has one-to-one mapping with X, then H(X, g(X)) ≥ H(g(X)).
Combining parts (b) and (d), we obtain H(X) ≥ H(g(X)).
3. Random variables X, Y , Z are said to form a Markov chain in that order (denoted by X → Y → Z) if their
joint probability distribution can be written as:
p(X, Y, Z) = p(X) · p(Y |X) · p(Z|Y )
(a) Suppose (X, Y, Z) forms a Markov chain. Is it possible for I(X; Y ) = I(X; Z)? If yes, give an example of
X, Y, Z where this happens. If no, explain why not.
(b) Suppose (X, Y, Z) does not form a Markov chain. Is it possible for I(X; Y ) ≥ I(X; Z)? If yes, give an
example of X, Y, Z where this happens. If no, explain why not.
(a) Yes; pick Z = Y .
Reason: The data processing inequality guarantees I(X; Y ) ≥ I(X; Z). Here we want to verify that
equality is possible. If we look at the proof of the data processing inequality, we just need to find a Z where I(X; Y |Z) = 0.
For Z = Y , intuitively, conditioning on Z, the reduction in uncertainty in X when we know Y is zero, because Z already tells us everything that Y can. Formally,I(X; Y ) = I(X; Z) because the random variables Y and Z have the same distribution. Note: to formally check that Z is conditionally independent ofXgivenY,wecancheckp(Z=z,X=x|Y =y)=p(Z=z|Y =y)·p(X=x|Y =y)forall possible x, y, z. The reason is that the left and right hand sides are zero when y ̸= z; and when y = z, they bothequalp(X =x|Y =y)asp(Z =z|X =x,Y =y)=1inthiscase.
(b) Yes; pick X, Z independent, and let Y = X + Z (assuming the outcomes are numeric).
Reason: Z is not conditionally independent of X given Y ; intuitively, knowing X + Z and X tells us what Z is. So (X,Y,Z) does not form a Markov chain. However, since X, Z are independent, I(X;Z) = 0. Since mutual information is non-negative, I(X; Y ) ≥ 0 = I(X; Z).

4. IfX →Y →Z,thenshowthat (a) I(X;Z)≤I(X;Y)
(b) I(X;Y|Z)≤I(X;Y)
Proof in lecture 9
5. A coin is known to land heads with probability 15 . The coin is flipped N times for some even integer N .
(a) Using Markov’s inequality, provide a bound on the probability of observing N2 or more heads.
(b) Using Chebyshev’s inequality, provide a bound on the probability of observing N2 or more heads. Express your answer in terms of N.
(c) For N ∈ {2,4,…,20}, in a single plot, show the bounds from part (a) and (b), as well as the exact probability of observing N2 or more heads.
X1, . . . , XN represents N flips, where, independent bernoulli random variable, Xi = 1 represents observing head from a coin flip and Xi = 0 represents observing tail. Suppose XˆN = N1 􏰐Ni=1 Xi. So, the probability of observing N2 heads can be expressed as p(XˆN ≥ 12 ) and p(Xi = 1) = 15 for each i.
(a) Using Markov’s Inequality,
∴ p ( Xˆ N ≥ N2 ) ≤ 25
(b) We need to calculate the variance of the bernoulli random variable: V ar(X) = p(1 − p)
∴Var[Xi]=(1)(1−1)= 4 5 5 25
Using the definition of variance and its properties,
p ( Xˆ N ≥ 1 ) ≤ E [ Xˆ N ] 2 N2
􏰐Ni=1 E[Xi] 1 2 =N=5=
ˆ 1 􏰄N 􏰐N Var[Xi] N(4 ) 4 V ar(XN ) = V ar[ Xi] = i=1 = 25 =
Using Chebyshev’s inequality,
N N2 N225N i=1
p(|XˆN −E[XˆN]|≥λ)≤ Var(XˆN) λ2
(c) The exact probability of a k heads is given by the binomial distribution:
􏰂N􏰃1k 4N−k P(X=k)= k (5)(5)
ˆ134 p(|XN− |≥ )≤25N
5 10 ( 3 )2 10
p(XˆN ≥1)≤ 16 2 9N

1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20
n = 2:2:20;
% Markov Inequality
% Chebyshev Inequality
y_c = 16 ./(9 .∗n);
% Exact Probabilities
So, the probability of seeing N/2 or more heads is
Another way to calculate the exact probability
y_m = 2/5;
P(X≥N/2)= 􏰄 P(X=k) k=N/2
􏰄N 􏰂 N 􏰃 1 k 4 N − k
p ( Xˆ N ≥ 21 ) = 1 − p ( Xˆ N < 21 ) This can be done in Matlab using (1-binocdf(floor(0.5.*n-0.5), n, 0.2)) Here floor(0.5.*n-0.5) simply brings the value of n to an integer less than n/2 for each value of n. For example, a value of n=10 would lead floor(0.5.*n-0.5) value of 4, which is what we want. The code for the plot above is included below: y_e = 1−binocdf(floor(0.5.∗n−0.5), n, 0.2); p l o t ( n , y_c , ’ k ’ ) hold on; plot ([2 20] ,[y_m y_m] , ’m−’) hold on; plot(n, y_e,’b’) hold on; set(gca,’fontsize’, 14) title(’Markov’’s bound, Chebyshev’’s bound and exact probability of obeserving more than N/2 heads’) ylabel(’Probability’) xlabel ( ’Number of coin flips (N) ’) legend(’Chebyshev’, ’Markov’,’Exact’); 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com