CS代考 COMP2610/COMP6261 Tutorial 4 Sample Solutions Tutorial 4: Entropy and Infor

COMP2610/COMP6261 Tutorial 4 Sample Solutions Tutorial 4: Entropy and Information
Young Lee and
Tutors: and
Week 5 (21st – 25th August), Semester 2, 2017

Copyright By PowCoder代写 加微信 powcoder

1. Suppose Y is a geometric random variable, Y ~ Geom(y) . i.e., Y has probability function P(Y =y)=p(1−p)y−1, y=1,2,…
Determine the mean and variance of the geometric random variable.
The expectation of the geometric random variable can be calculated as:
E[Y]=􏰄y·P(Y =y) y=1
= 􏰄 y · p(1 − p)y−1
= p 􏰄 y(1 − p)y−1 y=1
E[Y]=p[1+2(1−p)+3(1−p)2 +…] (1−p)E[Y]=[(1−p)+2(1−p)2 +3(1−p)3 +…]
E[Y]·(1−(1−p))=p[1+(1−p)+(1−p)2 +…] E[Y ] · p = p · 1
E[Y ] = p1
(*) Here we use the sum to infinity of geometric series, where |p| < 1, pi = 1 − p (1)−(2) (∗) To calculate the variance, we need to calculate E[Y 2]: E[Y2]=􏰄y2·P(Y =y) y=1 = 􏰄 y2 · p(1 − p)y−1 =􏰄(y−1+1)2 ·p(1−p)y−1 y=1 =􏰄((y−1)2 +2(y−1)+1)·p·ry−1 =􏰄z2prz +2􏰄zprz +􏰄prz z=0 z=0 z=0 ∞∞∞ =r·􏰄z2prz−1 +2r·􏰄zprz−1 +p􏰄rz z=0 z=0 z=0 ∞∞1 = r · 􏰄 z2prz−1 + 2r · 􏰄 zprz−1 + p · 1 − (1 − p) z=1 z=1 E[Y 2] = r · E[Y 2] + 2r · E[Y ] + 1 E[Y 2] = 1 + r ∴ the variance can be calculated as V ar[Y ] = E[Y 2] − (E[Y ])2 = 1+r −(1)2 p2 p letr=1−p letz=y−1 =1−p (6) p2 2. A standard deck of cards contains 4 suits — ♥, ♦, ♣, ♠ (“hearts”, “diamonds”, “clubs”, “spades”) — each with 13 values — A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K (The A, J, Q, K are called “Ace”, “Jack”, “Queen”, “King”). Each card has a colour: hearts and diamonds are coloured red; clubs and spades are black. Cards with values J, Q, K are called face cards. Each of the 52 cards in a deck is identified by its value v and suit s and denoted vs. For example, 2♥, J♣, and 7♠ are the “two of hearts”, “Jack of clubs”, and “7 of spades”, respectively. The variable c will be used to denote acard’scolour. Letf =1ifacardisafacecardandf =0otherwise. A card is drawn at random from a thoroughly shuffled deck. Calculate: (a) The information h(c = red, v = K) in observing a red King (b) The conditional information h(v = K|f = 1) in observing a King given a face card was drawn. (c) The entropies H(S) and H(V, S). (d) The mutual information I(V ; S) between V and S. (e) The mutual information I(V ; C) between the value and colour of a card using the last result and the data processing inequality. (a) h(c=red,v=K)=log 1 =log 1 =4.7004bits. (b) h(v=K|f =1)=log (c) We have 2 P(c=red,v=K) =1.585bits. i. H(S)=􏰐 p(s)log 1 =4×1 ×log 1 =2bits. 2 P(v=K|f=1) 2 1/3 s 2 p(s) 4 2 1/4 ii. H(V,S)=􏰐 p(v,s)log 1 =52× 1 log 1 =5.7bits. v,s 2 p(v,s) 52 2 1/52 (d) Since V and S are independent we have I (V ; S) = 0 bits. (e) Since C is a function of S and by the data processing inequality I (V ; C) ≤ I (V ; S) = 0. However, mutual information must be nonnegative so we must have I (V ; C) = 0 bits. 3. Recall that for a random variable X, its variance is Var[X] = E[X2] − (E[X])2. Using Jensen’s inequality, show that the variance must always be non-negative. This is a direct application of Jensen’s inequality to the convex function g(x) = x2. 4. LetXandY beindependentrandomvariableswithpossibleoutcomes{0,1},eachhavingaBernoullidistribution with parameter 21 , i.e. p(X =0)=p(X =1)= 21 p(Y = 0) = p(Y = 1) = 12. (a) ComputeI(X;Y). (b) LetZ=X+Y.ComputeI(X;Y|Z). (c) Do the above quantities contradict the data-processing inequality? Explain your answer. (a) WeseethatI(X;Y)=0asX ⊥Y. (b) To compute I(X; Y |Z) we apply the definition of conditional mutual information: I(X; Y |Z) = H(X|Z) − H(X|Y, Z) Now, X is fully determined by Y and Z. In other words, given Y and Z there is only one state of X that is possible, i.e it has probability 1. Therefore the entropy H(X|Y, Z) = 0. We have that: I(X; Y |Z) = H(X|Z) To determine this value we look at the distribution p(X|Z), which is computed by considering the following possibilities: Therefore: 000 011 101 112 p(X|Z = 0) = (1,0) p(X|Z = 1) = (1/2,1/2) p(X|Z = 2) = (0,1) From this, we obtain: H(X|Z = 0) = 0, H(X|Z = 2) = 0, H(X|Z = 1) = 1 bit. Therefore: I(X; Y |Z) = p(Z = 1)H(X|Z = 1) = (1/2)(1) = 0.5 bits. (c) This does not contradict the data-processing inequality (or more specifically the “conditioning on a down- stream variable” corollary): the random variables in this example do not form a Markov chain. In fact, Z depends on both X and Y . 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com