COMP2610/COMP6261 – Information Theory
Tutorial 5: Probabilistic inequalities and Mutual Information
Young Lee and Bob Williamson
Tutors: Debashish Chakraborty and Zakaria Mhammedi
Week 6 (28th Aug- 1st Sep), Semester 2, 2017
1. Consider a discrete variable X taking on values from the set X . Let pi be the probability of each state, with
i = 1, . . . , |X |. Denote the vector of probabilities by p. We saw in lectures that the entropy of X satisfies:
H(X) ≤ log|X |,
with equality if and only if pi =
1
|X |
for all i, i.e.p is uniform. Prove the above statement using Gibbs’ inequality,
which says
|X |∑
i=1
pi log2
pi
qi
≥ 0
for any probability distributions p,q over |X | outcomes, with equality if and only if p = q.
2. LetX be a discrete random variable. Show that the entropy of a function ofX is less than or equal to the entropy
of X by justifying the following steps:
H(X, g(X))
(a)
= H(X) +H(g(X)|X)
(b)
= H(X);
H(X, g(X))
(c)
= H(g(X)) +H(X|g(X))
(d)
≥ H(g(X)).
Thus H(g(X)) ≤ H(X).
3. Random variables X , Y , Z are said to form a Markov chain in that order (denoted by X → Y → Z) if their
joint probability distribution can be written as:
p(X,Y, Z) = p(X) · p(Y |X) · p(Z|Y )
(a) Suppose (X,Y, Z) forms a Markov chain. Is it possible for I(X;Y ) = I(X;Z)? If yes, give an example of
X,Y, Z where this happens. If no, explain why not.
(b) Suppose (X,Y, Z) does not form a Markov chain. Is it possible for I(X;Y ) ≥ I(X;Z)? If yes, give an
example of X,Y, Z where this happens. If no, explain why not.
4. If X → Y → Z, then show that
(a) I(X;Z) ≤ I(X;Y )
(b) I(X;Y |Z) ≤ I(X;Y )
1
5. A coin is known to land heads with probability 1
5
. The coin is flipped N times for some even integer N .
(a) Using Markov’s inequality, provide a bound on the probability of observing N
2
or more heads.
(b) Using Chebyshev’s inequality, provide a bound on the probability of observing N
2
or more heads. Express
your answer in terms of N .
(c) For N ∈ {2, 4, . . . , 20}, in a single plot, show the bounds from part (a) and (b), as well as the exact
probability of observing N
2
or more heads.
2