COMP2610 / COMP6261 – Information Theory Tutorial 2: Entropy and Information
Robert 2, 2018
1. Let X be a random variable with possible outcomes {1, 2, 3}. Let the probabilities of the outcomes be p(X = 1) = 2θ
p(X = 2) = 2θ p(X =3)=1−θ
Copyright By PowCoder代写 加微信 powcoder
for some parameter θ ∈ [0, 1].
Suppose we see N observations of the random variable, {x1, . . . , xN }. Let ni denote the number of times
that we observe the outcome X = i, i.e.
ni=N 1 ifxk=i
k=1 0 else.
(a) Writedownthelikelihoodfunctionofθgiventheobservations{x1,…,xN}intermsofn1,n2,n3.
(b) Suppose the observations are
{3,3,1,2,3,2,2,1,3,1}.
Compute the maximum likelihood estimate of θ. (Hint: Compute the log-likelihood function, and
check when the derivative is zero.)
2. Consider the following joint distribution over X, Y :
p(X,Y) X 1234
1 0 0 1/8 1/8 2 1/8 1/16 1/16 0 3 1/8 1/8 0 0 4 0 1/16 1/16 1/8
(a) Show that X and Y are not statistically independent. (Hint: You need only show that for at least one specific x,y pair, p(X = x,Y = y) ̸= p(X = x)p(Y = y).)
(b) Compute the following quantities: (i) H(X)
(ii) H(Y) (iii) H(X|Y) (iv) H(Y|X)
(v) H(X,Y) (vi) I(X;Y).
3. A standard deck of cards contains 4 suits — ♥, ♦, ♣, ♠ (“hearts”, “diamonds”, “clubs”, “spades”) — each with 13 values — A,2,3,4,5,6,7,8,9,10,J,Q,K (The A,J,Q,K are called “Ace”, “Jack”, “Queen”, “King”). Each card has a colour: hearts and diamonds are coloured red; clubs and spades are black. Cards with values J, Q, K are called face cards.
Each of the 52 cards in a deck is identified by its value v and suit s and denoted vs. For example, 2♥, J♣, and 7♠ are the “two of hearts”, “Jack of clubs”, and “7 of spades”, respectively. The variable c will be used to denote a card’s colour. Let f = 1 if a card is a face card and f = 0 otherwise.
A card is drawn at random from a thoroughly shuffled deck. Calculate:
(a) The information h(c = red, v = K) in observing a red King
(b) The conditional information h(v = K|f = 1) in observing a King given a face card was drawn.
(c) The entropies H(S) and H(V, S).
(d) The mutual information I(V ; S) between V and S.
(e) The mutual information I(V ; C) between the value and colour of a card using the last result and the data processing inequality.
4. Recall that for a random variable X, its variance is
Var[X] = E[X2] − (E[X])2.
Using Jensen’s inequality, show that the variance must always be nonnegative.
5. Let X and Y be independent random variables with possible outcomes {0, 1}, each having a Bernoulli
distribution with parameter 21 , i.e.
p(X =0)=p(X =1)= 12 p(Y = 0) = p(Y = 1) = 12.
(a) ComputeI(X;Y).
(b) LetZ=X+Y.ComputeI(X;Y|Z).
(c) Do the above quantities contradict the data-processing inequality? Explain your answer.
6. Consider a discrete variable X taking on values from the set X . Let pi be the probability of each state, with i = 1, . . . , |X |. Denote the vector of probabilities by p. We saw in lectures that the entropy of X satisfies:
H(X) ≤ log|X |,
with equality if and only if pi = 1 for all i, i.e. p is uniform. Prove the above statement using Gibbs’
inequality, which says
for any probability distributions p, q over |X | outcomes, with equality if and only if p = q.
p log pi ≥0
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com