MAST90083 Computational Statistics & Data Mining EM Algorithm
Tutorial and Practical 8: Solutions
Question 1
1. The mean of p(y) is given by
E(y) =
∫
yp (y) dy
=
∫
y
K∑
k=1
pkp (y|k) dy
=
K∑
k=1
pk
∫
yp (y|k) dy
=
K∑
k=1
pkµk
2. The covariance of y is given by
cov (y) = E
(
yy>
)
− E(y)E(y)>.
The first term on the right hand side is
E(yy>) =
∫
yy>p (y) dy
=
∫
yy>
K∑
k=1
pkp (y|k) dy
=
K∑
k=1
pk
∫
yy>p (y|k) dy
=
K∑
k=1
pkEk
(
yy>
)
=
K∑
k=1
pk
(
µkµ
>
k + Σk
)
where we used
Ek
(
yy>
)
= covk (y) + Ek(y)Ek(y)> = Σk + µkµ>k .
Question 2
1. Using the Bayes rule we have
p (θ|Y) ∝ p (Y|θ) p (θ)
and with the logarithm this gives
log p (θ|Y) ∝ log p (Y|θ) + log p (θ) .
1
MAST90083 Computational Statistics & Data Mining EM Algorithm
In the E-step we are interested in log p (Y,Z|θ)
log p (θ|Y) ∝ log
{∑
Z
p (Y,Z|θ)
}
+ log p (θ) (1)
= log
{[∑
Z
p (Y,Z|θ)
]
.p (θ)
}
(2)
= log
{∑
Z
p (Y,Z|θ) .p (θ)
}
. (3)
The only modification is therefore the replacement of p (Y,Z|θ) by p (Y,Z|θ) .p (θ).
2. The form of Q (θ,θi−1) is given by
Q∗
(
θ,θi−1
)
=
∑
Z
p
(
Z|Y,θi−1
)
log [p (Y,Z|θ) .p (θ)]
=
∑
Z
p
(
Z|Y,θi−1
)
log p (Y,Z|θ) +
∑
Z
p
(
Z|Y,θi−1
)
log p (θ)
=
∑
Z
p
(
Z|Y,θi−1
)
log p (Y,Z|θ) + log p (θ)
∑
Z
p
(
Z|Y,θi−1
)
=
∑
Z
p
(
Z|Y,θi−1
)
log p (Y,Z|θ) + log p (θ)
= Q
(
θ,θi−1
)
+ log p (θ) .
Question 3
Consider a random sample T1, · · · , Tn from a Weibull distribution which has the pdf
f(t|β) =
4
β
t3e−t
4/β, t > 0; β > 0.
Suppose we have observed T1 = y1, · · · , Tm = ym and Tm+1 > cm+1, · · · , Tn > cn, where m is
given, m < n, and y1, · · · , ym and cm+1 · · · , cn are given numerical values. This implies that
T1, · · · , Tm are completely observed whereas Tm+1, · · · , Tn are partially observed in that they
are right-censored. We want to use an EM algorithm to find the MLE of β.
1. Find the complete-data log-likelihood function `(β) = lnL(β).
� `(β) = ln 4n − n ln β +
∑n
i=1 lnT
3
i −
1
β
∑n
i=1 T
4
i .
2. In the E-step, we calculate
Q(β, β(k)) = E[lnL(β) | T1 = y1, · · · , Tm = ym, Tm+1 > cm+1, · · · , Tn > cn; β(k)]
2
MAST90083 Computational Statistics & Data Mining EM Algorithm
where β(k) is the kth-step estimate of β. Show that
Q(β, β(k)) = ln 4n − n ln β +
m∑
i=1
ln y3i +
n∑
i=m+1
E[lnT 3i |Ti>ci, β
(k)]
−
1
β
m∑
i=1
y4i −
1
β
n∑
i=m+1
c4i −
(n−m)β(k)
β
.
(Note: The result E(T 4|T > c) = c4 + β for the Weibull random variable T considered
here can be used without proof.)
� Q(β, β(k)) = ln 4n − n ln β +
∑m
i=1E(lnT
3
i |Ti = yi, β = β(k))
+
∑n
i=m+1E(lnT
3
i |Ti>ci, β = β(k))−
1
β
∑m
i=1E(T
4
i |Ti = yi, β = β(k))
− 1
β
∑n
i=m+1E(T
4
i |Ti > ci, β = β(k))
= ln 4n − n ln β +
∑m
i=1 ln y
3
i +
∑n
i=m+1E[lnT
3
i |Ti>ci, β(k)]
− 1
β
∑m
i=1 y
4
i −
1
β
∑n
i=m+1(c
4
i + β
(k))
= ln 4n − n ln β +
∑m
i=1ln y
3
i +
∑n
i=m+1E[lnT
3
i |Ti>ci, β(k)]
− 1
β
∑m
i=1 y
4
i −
1
β
∑n
i=m+1 c
4
i −
(n−m)β(k)
β
.
3. In the M-step, we maximise Q(β, β(k)) with respect to β to find an update β(k+1) from
β(k). Show that
β(k+1) =
1
n
[
m∑
i=1
y4i +
n∑
i=m+1
c4i + (n−m)β
(k)
]
.
�
∂Q(β,β(k))
∂β
= −n
β
+ 1
β2
∑m
i=1 y
4
i +
1
β2
∑n
i=m+1 c
4
i +
(n−m)β(k)
β2
.
� Solving
∂Q(β,β(k))
∂β
= 0, it follows that β(k+1) = 1
n
[∑m
i=1 y
4
i +
∑n
i=m+1 c
4
i + (n−m)β(k)
]
.
Question 4
Suppose Y1, · · · , Yn are to be independently observed from a two-component normal mixture
model which has the following pdf
g(yi|θ) =
p
σ
√
2π
exp{−
1
2σ2
(yi − µ1)2}+
1− p
σ
√
2π
exp{−
1
2σ2
(yi − µ2)2}; i = 1, · · · , n
where θ = (p, µ1, µ2, σ) are unknown parameters with 0 ≤ p ≤ 1 and σ > 0. We want to use
the EM algorithm to find the MLE θ̂ of θ. In order to do this, we introduce an unobserved
indicator variable Zi for each i = 1, · · · , n, with P (Zi = 1) = p and P (Zi = 0) = 1−p. Then we
know the conditional pdf of Yi given Zi is (Yi|Zi = 1)
d
= N(µ1, σ
2) and (Yi|Zi = 0)
d
= N(µ2, σ
2).
1. Write down the observed-data log-likelihood function lnL(θ|yn) where yn = (y1, · · · , yn)T
are observations of Y1, · · · , Yn.
3
MAST90083 Computational Statistics & Data Mining EM Algorithm
� The observed-data log-likelihood is
lnL(θ|yn) = ln
n∏
i=1
g(yi|θ) =
n∑
i=1
ln g(yi|θ) =
=
n∑
i=1
ln
[
p
σ
√
2π
exp{−
1
2σ2
(yi − µ1)2}+
1− p
σ
√
2π
exp{−
1
2σ2
(yi − µ2)2}
]
.
2. Write down the complete-data log-likelihood function lnL(θ|yn,Zn) where Zn = (Z1, · · · , Zn)T .
� The joint pdf of (Yi, Zi) is
f(yi, zi|θ) = f(yi|zi, θ)f(zi) =
[
p
σ
√
2π
e
− 1
2σ2
(yi−µ1)2
]zi [ 1− p
σ
√
2π
e
− 1
2σ2
(yi−µ2)2
]1−zi
.
� Hence the complete-data log-likelihood is
lnL(θ|yn,Zn) = ln
n∏
i=1
f(yi, Zi|θ) =
n∑
i=1
ln f(yi, Zi|θ)
=
n∑
i=1
ln
[
p
σ
√
2π
e
− 1
2σ2
(yi−µ1)2
]zi [ 1− p
σ
√
2π
e
− 1
2σ2
(yi−µ2)2
]1−zi
= −n ln
√
2π − n lnσ +
n∑
i=1
[Zi ln p+ (1− Zi) ln(1− p)]
−
1
2σ2
n∑
i=1
[
Zi(yi − µ1)2 + (1− Zi)(yi − µ2)2
]
.
3. Show that the conditional pdf of Zi given Yi = yi and θ is
P (Zi = 1|yi, θ) =
p exp{− 1
2σ2
(yi − µ1)2}
p exp{− 1
2σ2
(yi − µ1)2}+ (1− p) exp{− 12σ2 (yi − µ2)
2}
and P (Zi = 0|yi, θ) = 1− P (Zi = 1|yi, θ). Then find E(Zi|yi, θ) and Var(Zi|yi, θ).
� P (Zi = 1|yi, θ) =
f(yi|zi = 1, θ)f(zi = 1)
g(yi|θ)
=
1
σ
√
2π
e
− 1
2σ2
(yi−µ1)2 · p
1
σ
√
2π
e
− 1
2σ2
(yi−µ1)2 · p+ 1
σ
√
2π
e
− 1
2σ2
(yi−µ2)2 · (1− p)
=
pe
− 1
2σ2
(yi−µ1)2
pe
− 1
2σ2
(yi−µ1)2 + (1− p)e−
1
2σ2
(yi−µ2)2
.
� Zi has only two possible values 0 and 1.
Hence P (Zi = 0|yi, θ) = 1− P (Zi = 1|yi, θ).
� E(Zi|yi, θ) = 1 · P (Zi = 1|yi, θ) + 0 · P (Zi = 0|yi, θ) = P (Zi = 1|yi, θ).
� Var(Zi|yi, θ) = E(Z2i |yi, θ)− [E(Zi|yi, θ)]2 = P (Zi = 1|yi, θ)P (Zi = 0|yi, θ).
4
MAST90083 Computational Statistics & Data Mining EM Algorithm
4. Use the results in (3) to derive and simplify Q(θ|θ(k)) = E[lnL(θ|yn,Zn)|yn, θ(k)] that is
to be used in the E-step. Here θ(k) = (p(k), µ
(k)
1 , µ
(k)
2 , σ
(k)) is the kth update of θ obtained
from the EM algorithm.
� Using results of (b) and (c),
Q(θ|θ(k)) = E[lnL(θ|yn,Zn)|yn, θ
(k)]
= −n ln
√
2π − n lnσ +
n∑
i=1
[
z
(k)
i ln p+ (1− z
(k)
i ) ln(1− p)
]
−
1
2σ2
n∑
i=1
[
z
(k)
i (yi − µ1)
2 + (1− z(k)i )(yi − µ2)
2
]
.
5. Denote z
(k)
i = E(Zi|yi, θ
(k)). By implementing the M-step show that the (k+1)th update
of θ is
p(k+1) =
1
n
n∑
i=1
z
(k)
i , µ
(k+1)
1 =
∑n
i=1 z
(k)
i yi∑n
i=1 z
(k)
i
, µ
(k+1)
2 =
∑n
i=1(1− z
(k)
i )yi∑n
i=1(1− z
(k)
i )
σ(k+1) =
√√√√ 1
n
n∑
i=1
[
z
(k)
i (yi − µ
(k+1)
1 )
2 + (1− z(k)i )(yi − µ
(k+1)
2 )
2
]
z
(k+1)
i =
p(k+1) exp{− (yi−µ
(k+1)
1 )
2
2(σ(k+1))2
}
p(k+1) exp{− (yi−µ
(k+1)
1 )
2
2(σ(k+1))2
}+ (1− p(k+1)) exp{− (yi−µ
(k+1)
2 )
2
2(σ(k+1))2
}
� It can be seen from the result of (4) that
∂Q(θ|θ(k))
∂p
=
∑n
i=1 z
(k)
i
p
−
∑n
i=1(1− z
(k)
i )
(1− p)
,
∂Q(θ|θ(k))
∂µ1
=
1
σ2
n∑
i=1
z
(k)
i (yi−µ1),
∂Q(θ|θ(k))
∂µ2
=
1
σ2
n∑
i=1
(1−z(k)i )(yi−µ2),
∂Q(θ|θ(k))
∂σ
= −
n
σ
+
1
σ3
n∑
i=1
[
z
(k)
i (yi − µ1)
2 + (1− z(k)i )(yi − µ2)
2
]
.
� The updates can be shown by setting these derivatives to zero, solving the results
equations w.r.t. p, µ1, µ2 and σ, and then using z
(k+1)
i = E(Zi|yi, θ
(k+1)) and the
result of (c).
6. The following R function implements the EM algorithm for the problem. Test the
function using a simulated sample of n = 400 observations where the true value of θ is
(p = 0.25, µ1 = 5, µ2 = 10, σ = 1.5). Use different initial values, e.g.
θ̂ = (0.4,min(y),max(y), sd(y)); or (0.4,max(y),min(y), sd(y));
or (0.9,min(y),max(y), sd(y) + 1); etc..
5
MAST90083 Computational Statistics & Data Mining EM Algorithm
em1=tut3em.f(y, theta0=c(0.4, min(y), max(y), sd(y)))
em2=tut3em.f(y, theta0=c(0.4, max(y), min(y), sd(y)))
em3=tut3em.f(y, theta0=c(0.9, min(y), max(y), sd(y)+1), iter=70)
> attributes(em3)
$names
[1] “theta” “z” “loglik”
> em3$theta[71,]
[1] 0.2493479 4.6392209 10.1309750 1.4036674
> p l o t (em3$ l o g l i k [=1] , type= ‘b ‘ )
●
●
● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 10 20 30 40 50 60 70
−
9
8
0
−
9
6
0
−
9
4
0
−
9
2
0
Iterations
O
b
s
e
r
v
e
d
d
a
ta
l
o
g
−
li
k
e
li
h
o
o
d
6