COMP2610/COMP6261 Tutorial 4 Sample Solutions Tutorial 4: Entropy and Information
Young Lee and
Tutors: and
Week 5 (21st – 25th August), Semester 2, 2017
Copyright By PowCoder代写 加微信 powcoder
1. Suppose Y is a geometric random variable, Y ~ Geom(y) . i.e., Y has probability function P(Y =y)=p(1−p)y−1, y=1,2,…
Determine the mean and variance of the geometric random variable.
The expectation of the geometric random variable can be calculated as:
E[Y]=y·P(Y =y) y=1
= y · p(1 − p)y−1
= p y(1 − p)y−1 y=1
E[Y]=p[1+2(1−p)+3(1−p)2 +…] (1−p)E[Y]=[(1−p)+2(1−p)2 +3(1−p)3 +…]
E[Y]·(1−(1−p))=p[1+(1−p)+(1−p)2 +…] E[Y ] · p = p · 1
E[Y ] = p1
(*) Here we use the sum to infinity of geometric series, where |p| < 1,
pi = 1 − p
(1)−(2) (∗)
To calculate the variance, we need to calculate E[Y 2]:
E[Y2]=y2·P(Y =y) y=1
= y2 · p(1 − p)y−1
=(y−1+1)2 ·p(1−p)y−1 y=1
=((y−1)2 +2(y−1)+1)·p·ry−1
=z2prz +2zprz +prz z=0 z=0 z=0
∞∞∞ =r·z2prz−1 +2r·zprz−1 +prz
z=0 z=0 z=0 ∞∞1
= r · z2prz−1 + 2r · zprz−1 + p · 1 − (1 − p) z=1 z=1
E[Y 2] = r · E[Y 2] + 2r · E[Y ] + 1 E[Y 2] = 1 + r
∴ the variance can be calculated as
V ar[Y ] = E[Y 2] − (E[Y ])2
= 1+r −(1)2 p2 p
letr=1−p letz=y−1
=1−p (6) p2
2. A standard deck of cards contains 4 suits — ♥, ♦, ♣, ♠ (“hearts”, “diamonds”, “clubs”, “spades”) — each with 13 values — A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K (The A, J, Q, K are called “Ace”, “Jack”, “Queen”, “King”). Each card has a colour: hearts and diamonds are coloured red; clubs and spades are black. Cards with values J, Q, K are called face cards.
Each of the 52 cards in a deck is identified by its value v and suit s and denoted vs. For example, 2♥, J♣, and 7♠ are the “two of hearts”, “Jack of clubs”, and “7 of spades”, respectively. The variable c will be used to denote acard’scolour. Letf =1ifacardisafacecardandf =0otherwise.
A card is drawn at random from a thoroughly shuffled deck. Calculate:
(a) The information h(c = red, v = K) in observing a red King
(b) The conditional information h(v = K|f = 1) in observing a King given a face card was drawn.
(c) The entropies H(S) and H(V, S).
(d) The mutual information I(V ; S) between V and S.
(e) The mutual information I(V ; C) between the value and colour of a card using the last result and the data processing inequality.
(a) h(c=red,v=K)=log 1 =log 1 =4.7004bits.
(b) h(v=K|f =1)=log
(c) We have
2 P(c=red,v=K)
=1.585bits. i. H(S)= p(s)log 1 =4×1 ×log 1 =2bits.
2 P(v=K|f=1) 2 1/3
s 2 p(s) 4 2 1/4
ii. H(V,S)= p(v,s)log 1 =52× 1 log 1 =5.7bits. v,s 2 p(v,s) 52 2 1/52
(d) Since V and S are independent we have I (V ; S) = 0 bits.
(e) Since C is a function of S and by the data processing inequality I (V ; C) ≤ I (V ; S) = 0. However,
mutual information must be nonnegative so we must have I (V ; C) = 0 bits.
3. Recall that for a random variable X, its variance is
Var[X] = E[X2] − (E[X])2.
Using Jensen’s inequality, show that the variance must always be non-negative.
This is a direct application of Jensen’s inequality to the convex function g(x) = x2.
4. LetXandY beindependentrandomvariableswithpossibleoutcomes{0,1},eachhavingaBernoullidistribution
with parameter 21 , i.e.
p(X =0)=p(X =1)= 21 p(Y = 0) = p(Y = 1) = 12.
(a) ComputeI(X;Y).
(b) LetZ=X+Y.ComputeI(X;Y|Z).
(c) Do the above quantities contradict the data-processing inequality? Explain your answer.
(a) WeseethatI(X;Y)=0asX ⊥Y.
(b) To compute I(X; Y |Z) we apply the definition of conditional mutual information:
I(X; Y |Z) = H(X|Z) − H(X|Y, Z)
Now, X is fully determined by Y and Z. In other words, given Y and Z there is only one state of X that
is possible, i.e it has probability 1. Therefore the entropy H(X|Y, Z) = 0. We have that: I(X; Y |Z) = H(X|Z)
To determine this value we look at the distribution p(X|Z), which is computed by considering the following possibilities:
Therefore:
000 011 101 112
p(X|Z = 0) = (1,0) p(X|Z = 1) = (1/2,1/2) p(X|Z = 2) = (0,1)
From this, we obtain: H(X|Z = 0) = 0, H(X|Z = 2) = 0, H(X|Z = 1) = 1 bit. Therefore: I(X; Y |Z) = p(Z = 1)H(X|Z = 1) = (1/2)(1) = 0.5 bits.
(c) This does not contradict the data-processing inequality (or more specifically the “conditioning on a down- stream variable” corollary): the random variables in this example do not form a Markov chain. In fact, Z depends on both X and Y .
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com