编程代写 COMP2610/COMP6261 Tutorial 2 Solutions∗ Semester 2, 2018

COMP2610/COMP6261 Tutorial 2 Solutions∗ Semester 2, 2018
1. (a) Let ni denote the number of times that we observe outcome X = i. The likelihood is
L(θ) = 􏰕p(X = xi|θ)
(b) The log-likelihood is

Copyright By PowCoder代写 加微信 powcoder

􏰕􏰂θ􏰃 􏰕􏰂θ􏰃 􏰕
= 2· 2· (1−θ) i:xi =1 i:xi =2 i:xi =3
􏰂θ􏰃n1 􏰂θ􏰃n2
= 2 · 2 ·(1−θ)n3
= 2 ·(1−θ)n3.
L(θ)=(n1 +n2)·log2θ +n3 ·log(1−θ).
The derivative is
Wehavethatn1 =3,n2 =3,n3 =4. So,weneed
L′(θ)=n1+n2− n3 . θ 1−θ
for which the solution may be checked to be θ = 0.6. Observe then that we estimate
p(X =1)=0.3 p(X =2)=0.3 p(X = 3) = 0.4,
matching the frequencies of observations of each outcome. ∗Based in part on solutions by or the 2012 version of the course.

2. (a) We can show that X and Y are not statistically independent by showing that p(x, y) ̸= p(x)p(y) for at least one value of x and y. For example: p(X = 1) = 1/8 + 1/8 = 1/4 and p(Y = 2) = 1/8 + 1/16 + 1/16 = 1/4. From the given table we see that: p(X =1,Y =2)=1/8whichisdifferentfromp(X =1)p(Y =2)=1/16.
(b) First, we find the marginal probabilities using the sum rule:
p(X) = 􏰀P(X = 1),P(X = 2),P(X = 3),P(X = 4)􏰁 = (1/4,1/4,1/4,1/4) p(Y)=􏰀P(Y =1),P(Y =2),P(Y =3),P(Y =4)􏰁=(1/4,1/4,1/4,1/4).
We see that both p(X) and p(Y ) are uniform distributions with 4 possible states. Hence: H(X) = H(Y ) = log2 4 = 2 bits.
To compute the conditional entropy H(X|Y ) we need the conditional distributions p(X|Y) which can be computed by using the definition of conditional probability p(X = x|Y = y) = p(X = x, Y = y)/p(Y = y). In other words, we divide the rows of the given table by the corresponding marginal.
p(X|Y =1)=(0,0,1/2,1/2) p(X|Y =2)=(1/2,1/4,1/4,0) p(X|Y =3)=(1/2,1/2,0,0) p(X|Y =4)=(0,1/4,1/4,1/2).
Hence the conditional entropy H(X|Y ) is given by:
H(X|Y)=􏰄p(Y =i)H(X|Y =i) i=1
= (1/4)H (0, 0, 1/2, 1/2) + (1/4)H (1/2, 1/4, 1/4, 0) + (1/4)H (1/2, 1/2, 0, 0) + (1/4)H (0, 1/4, 1/4, 1/2) = 1/4×1+1/4×3/2+1/4×1+1/4×3/2
= 5/4 bits.
Here we note that conditioning has indeed decreased entropy. We can compute the joint entropy by using the chain rule:
H(X, Y ) = H(X|Y ) + H(Y ) = 5/4 + 2 = 13/4 bits. Additionally, we know that by the chain rule H(X, Y ) = H(Y |X) + H(X), hence:
H(Y |X) = H(X, Y ) − H(X) = 13/4 − 2 = 5/4 bits.
We see that for this particular example H(X|Y ) = H(Y |X), which is not generally the
Finally, the mutual information I(X; Y ) is given by:
I(X; Y ) = H(X) − H(X|Y ) = 2 − 5/4 = 3/4 bits. 2

3. (a) (b) (c)
h(c=red,v=K)=log 1 =log 1 =4.7004bits. 2 P(c=red,v=K) 2 1/26
h(v=K|f=1)=log We have
1 =log 1 =1.585bits. 2 P(v=K|f=1) 2 1/3
i. H(S)=􏰐 p(s)log 1 =4×1 ×log 1 =2bits. s 2 p(s) 4 2 1/4
ii. H(V,S)=􏰐 p(v,s)log 1 =52× 1 log 1 =5.7bits. v,s 2 p(v,s) 52 2 1/52
Since V and S are independent we have I (V ; S) = 0 bits.
SinceCisafunctionofSandbythedataprocessinginequalityI(V;C)≤I(V;S)= 0. However, mutual information must be nonnegative so we must have I (V ; C) = 0 bits.
4. This is a direct application of Jensen’s inequality to the convex function g(x) = x2. 5. (a)WeseethatI(X;Y)=0asX⊥Y.
(b) To compute I(X; Y |Z) we apply the definition of conditional mutual information: I(X; Y |Z) = H(X|Z) − H(X|Y, Z)
Now, X is fully determined by Y and Z. In other words, given Y and Z there is only one state of X that is possible, i.e it has probability 1. Therefore the entropy H(X|Y, Z) = 0. We have that:
I(X; Y |Z) = H(X|Z)
To determine this value we look at the distribution p(X|Z), which is computed by
considering the following possibilities:
Therefore:
000 011 101 112
p(X|Z = 0) = (1,0) p(X|Z = 1) = (1/2,1/2) p(X|Z = 2) = (0,1)
From this, we obtain: H(X|Z = 0) = 0, H(X|Z = 2) = 0, H(X|Z = 1) = 1 bit. Therefore:
I(X; Y |Z) = p(Z = 1)H(X|Z = 1) = (1/2)(1) = 0.5 bits. 3

(c) This does not contradict the data-processing inequality (or more specifically the “con- ditioning on a downstream variable” corollary): the random variables in this example do not form a Markov chain. In fact, Z depends on both X and Y .
6. Gibb’s inequality tells us that for any two probability vectors p = (p1 , . . . , p|X | ) and q = (q1,…,q|X|):
|X| p 􏰄pilog i ≥0
with equality if and only if p = q. If we take q to be the vector representing the uniform
distribution q1 = … = q|X| = 1 , then we get |X|
|X|p|X| |X|
0≤􏰄pilog 1i =􏰄pilogpi +􏰄pilog|X|=−H(p)+log|X|
i=1 |X| i=1 i=1
with equality if and only if p is the uniform distribution. Moving H(p) to the other side gives the inequality.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com