程序代写代做代考 matlab chain COMP2610/COMP6261

COMP2610/COMP6261
Tutorial 5 Sample Solutions

Tutorial 5: Probabilistic inequalities and Mutual Information

Young Lee and Bob Williamson
Tutors: Debashish Chakraborty and Zakaria Mhammedi

Week 5 (21st – 25th August), Semester 2, 2017

1. Consider a discrete variable X taking on values from the set X . Let pi be the probability of each state, with
i = 1, . . . , |X |. Denote the vector of probabilities by p. We saw in lectures that the entropy of X satisfies:

H(X) ≤ log|X |,

with equality if and only if pi =
1

|X |
for all i, i.e.p is uniform. Prove the above statement using Gibbs’ inequality,

which says
|X |∑
i=1

pi log2
pi
qi
≥ 0

for any probability distributions p,q over |X | outcomes, with equality if and only if p = q.
Solution.
Gibb’s inequality tells us that for any two probability vectors p = (p1, . . . , p|X |) and q = (q1, . . . , q|X |):

|X |∑
i=1

pi log
pi
qi
≥ 0

with equality if and only if p = q. If we take q to be the vector representing the uniform distribution
q1 = . . . = q|X | =

1

|X |
, then we get

0 ≤
|X |∑
i=1

pi log
pi
1
|X |

=

|X |∑
i=1

pi log pi +

|X |∑
i=1

pi log|X | = −H(p) + log|X |

with equality if and only if p is the uniform distribution. Moving H(p) to the other side gives the inequality.

1

2. LetX be a discrete random variable. Show that the entropy of a function ofX is less than or equal to the entropy
of X by justifying the following steps:

H(X, g(X))
(a)
= H(X) +H(g(X)|X)
(b)
= H(X);

H(X, g(X))
(c)
= H(g(X)) +H(X|g(X))
(d)

≥ H(g(X)).

Thus H(g(X)) ≤ H(X).
Solution.

(a) This is using the chain rule of entropy,
i.e. H(X,Y ) = H(X) +H(Y | X) where Y = g(X)

(b) GivenX , we can determine g(X) since it is fixed, being a function ofX . This means no uncertainty remains
about g(X) when X is given. Thus, H(g(X) | X) = 0 since


x p(x)p(g(X) | X = x) = 0.

(c) This is also using the chain rule of entropy,
i.e. H(X,Y ) = H(Y ) +H(X | Y ) where Y = g(X)

(d) In this case, H(X | g(X)) ≥ 0 since the conditional entropy of a discrete random variable is non-negative.
If g(X) has one-to-one mapping with X , then H(X, g(X)) ≥ H(g(X)).

Combining parts (b) and (d), we obtain H(X) ≥ H(g(X)).

3. Random variables X , Y , Z are said to form a Markov chain in that order (denoted by X → Y → Z) if their
joint probability distribution can be written as:

p(X,Y, Z) = p(X) · p(Y |X) · p(Z|Y )

(a) Suppose (X,Y, Z) forms a Markov chain. Is it possible for I(X;Y ) = I(X;Z)? If yes, give an example of
X,Y, Z where this happens. If no, explain why not.

(b) Suppose (X,Y, Z) does not form a Markov chain. Is it possible for I(X;Y ) ≥ I(X;Z)? If yes, give an
example of X,Y, Z where this happens. If no, explain why not.

Solution.

(a) Yes; pick Z = Y .
Reason: The data processing inequality guarantees I(X;Y ) ≥ I(X;Z). Here we want to verify that
equality is possible. If we look at the proof of the data processing inequality, we just need to find a Z where
I(X;Y |Z) = 0.
For Z = Y , intuitively, conditioning on Z, the reduction in uncertainty in X when we know Y is zero,
because Z already tells us everything that Y can. Formally,I(X;Y ) = I(X;Z) because the random
variables Y and Z have the same distribution. Note: to formally check that Z is conditionally independent
of X given Y , we can check p(Z = z,X = x|Y = y) = p(Z = z|Y = y) · p(X = x|Y = y) for all
possible x, y, z. The reason is that the left and right hand sides are zero when y 6= z; and when y = z, they
both equal p(X = x|Y = y) as p(Z = z|X = x, Y = y) = 1 in this case.

(b) Yes; pick X , Z independent, and let Y = X + Z (assuming the outcomes are numeric).
Reason: Z is not conditionally independent ofX given Y ; intuitively, knowingX +Z andX tells us what
Z is. So (X,Y, Z) does not form a Markov chain. However, since X , Z are independent, I(X;Z) = 0.
Since mutual information is non-negative, I(X;Y ) ≥ 0 = I(X;Z).

2

4. If X → Y → Z, then show that

(a) I(X;Z) ≤ I(X;Y )
(b) I(X;Y |Z) ≤ I(X;Y )

Proof in lecture 9

5. A coin is known to land heads with probability 1
5
. The coin is flipped N times for some even integer N .

(a) Using Markov’s inequality, provide a bound on the probability of observing N
2
or more heads.

(b) Using Chebyshev’s inequality, provide a bound on the probability of observing N
2
or more heads. Express

your answer in terms of N .
(c) For N ∈ {2, 4, . . . , 20}, in a single plot, show the bounds from part (a) and (b), as well as the exact

probability of observing N
2
or more heads.

Solution.
X1, . . . , XN represents N flips, where, independent bernoulli random variable, Xi = 1 represents observing
head from a coin flip and Xi = 0 represents observing tail. Suppose X̂N = 1N

∑N
i=1Xi. So, the probability of

observing N
2
heads can be expressed as p(X̂N ≥ 12 ) and p(Xi = 1) =

1
5
for each i.

(a) Using Markov’s Inequality,

p(X̂N ≥
1

2
) ≤

E[X̂N ]
N
2

=

∑N
i=1

E[Xi]

N
1
2

=
1
5
1
2

=
2

5

∴ p(X̂N ≥
N

2
) ≤

2

5

(b) We need to calculate the variance of the bernoulli random variable: V ar(X) = p(1− p)

∴ V ar[Xi] = (
1

5
)(1−

1

5
) =

4

25

Using the definition of variance and its properties,

V ar(X̂N ) = V ar[
1

N

N∑
i=1

Xi] =

∑N
i=1 V ar[Xi]

N2
=
N( 4

25
)

N2
=

4

25N

Using Chebyshev’s inequality,

p(| X̂N − E[X̂N ] |≥ λ) ≤
V ar(X̂N )

λ2

p(| X̂N −
1

5
|≥

3

10
) ≤

4
25N

( 3
10
)2

p(X̂N ≥
1

2
) ≤

16

9N

(c) The exact probability of a k heads is given by the binomial distribution:

P (X = k) =

(
N

k

)
(
1

5
)k(

4

5
)N−k

3

So, the probability of seeing N/2 or more heads is

P (X ≥ N/2) =
N∑

k=N/2

P (X = k)

=

N∑
k=N/2

(
N

k

)
(
1

5
)k(

4

5
)N−k

Another way to calculate the exact probability

p(X̂N ≥
1

2
) = 1− p(X̂N < 1 2 ) This can be done in Matlab using (1-binocdf(floor(0.5.*n-0.5), n, 0.2)) Here floor(0.5.*n-0.5) simply brings the value of n to an integer less than n/2 for each value of n. For example, a value of n=10 would lead floor(0.5.*n-0.5) value of 4, which is what we want. The code for the plot above is included below: 1 n = 2 : 2 : 2 0 ; 2 3 % Markov I n e q u a l i t y 4 y_m = 2 / 5 ; 5 6 % Chebyshev I n e q u a l i t y 7 y_c = 16 . / ( 9 .∗ n ) ; 8 9 % Exac t P r o b a b i l i t i e s 10 y_e = 1−b i n o cd f ( f l o o r ( 0 . 5 . ∗ n−0.5) , n , 0 . 2 ) ; 11 12 p l o t ( n , y_c , ’ k ’ ) 13 ho ld on ; 14 p l o t ( [ 2 2 0 ] , [ y_m y_m ] , ’m−’ ) 15 ho ld on ; 16 p l o t ( n , y_e , ’ b ’ ) 17 ho ld on ; 18 s e t ( gca , ’ f o n t s i z e ’ , 14) 19 20 t i t l e ( ’Markov ’ ’ s bound , Chebyshev ’ ’ s bound and e x a c t p r o b a b i l i t y o f o b e s e r v i n g more t h an N/2 heads ’ ) 21 y l a b e l ( ’ P r o b a b i l i t y ’ ) 22 x l a b e l ( ’Number o f co i n f l i p s (N) ’ ) 23 l e g end ( ’ Chebyshev ’ , ’Markov ’ , ’ Exac t ’ ) ; 4