MAST90083 Computational Statistics & Data Mining EM Algorithm
Tutorial & Practical 8: EM Algorithm
Question 1
Consider a mixture distribution of the form
p (y) =
K∑
k=1
pkp (y|k)
where the elements of y could be discrete or continuous or a combination of these. Denote the
mean and the covariance of p (y|k) by µk and Σk respectively.
1. Show that the mean of the mixture is given by
E [y] =
K∑
k=1
pkµk
2. Show that the covariance of the mixture is given by
Cov [y] =
K∑
k=1
pk
{
Σk + µkµ
>
k
}
− E [y]E [y]>
Question 2
Assume we wish to use the EM algorithm to maximize the posterior distribution over pa-
rameters p (θ|Y) for a model containing latent variables Z and where Y is the observed data
set.
1. Describe the modification of the E-step compared to the maximum likelihood case.
2. Show that the quantity to be maximized in the M-step is given by
Q
(
θ,θi−1
)
+ log p (θ)
where
Q
(
θ,θi−1
)
=
∑
z
log p (Y,Z|θ) p
(
Z|Y,θi−1
)
Question 3
Consider a random sample T1, · · · , Tn from a Weibull distribution which has the pdf
f(t|β) =
4
β
t3e−t
4/β, t > 0; β > 0.
Suppose we have observed T1 = y1, · · · , Tm = ym and Tm+1 > cm+1, · · · , Tn > cn, where m is
given, m < n, and y1, · · · , ym and cm+1 · · · , cn are given numerical values. This implies that
T1, · · · , Tm are completely observed whereas Tm+1, · · · , Tn are partially observed in that they
are right-censored. We want to use an EM algorithm to find the MLE of β.
1
MAST90083 Computational Statistics & Data Mining EM Algorithm
1. Find the complete-data log-likelihood function `(β) = lnL(β).
2. In the E-step, we calculate
Q(β, β(k)) = E[lnL(β) | T1 = y1, · · · , Tm = ym, Tm+1 > cm+1, · · · , Tn > cn; β(k)]
where β(k) is the kth-step estimate of β. Show that
Q(β, β(k)) = ln 4n − n ln β +
m∑
i=1
ln y3i +
n∑
i=m+1
E[lnT 3i |Ti>ci, β
(k)]
−
1
β
m∑
i=1
y4i −
1
β
n∑
i=m+1
c4i −
(n−m)β(k)
β
.
(Note: The result E(T 4|T > c) = c4 + β for the Weibull random variable T considered
here can be used without proof.)
3. In the M-step, we maximise Q(β, β(k)) with respect to β to find an update β(k+1) from
β(k). Show that
β(k+1) =
1
n
[
m∑
i=1
y4i +
n∑
i=m+1
c4i + (n−m)β
(k)
]
.
Question 4
Suppose Y1, · · · , Yn are to be independently observed from a two-component normal mixture
model which has the following pdf
g(yi|θ) =
p
σ
√
2π
exp{−
1
2σ2
(yi − µ1)2}+
1− p
σ
√
2π
exp{−
1
2σ2
(yi − µ2)2}; i = 1, · · · , n
where θ = (p, µ1, µ2, σ) are unknown parameters with 0 ≤ p ≤ 1 and σ > 0. We want to use
the EM algorithm to find the MLE θ̂ of θ. In order to do this, we introduce an unobserved
indicator variable Zi for each i = 1, · · · , n, with P (Zi = 1) = p and P (Zi = 0) = 1−p. Then we
know the conditional pdf of Yi given Zi is (Yi|Zi = 1)
d
= N(µ1, σ
2) and (Yi|Zi = 0)
d
= N(µ2, σ
2).
1. Write down the observed-data log-likelihood function lnL(θ|yn) where yn = (y1, · · · , yn)T
are observations of Y1, · · · , Yn.
2. Write down the complete-data log-likelihood function lnL(θ|yn,Zn) where Zn = (Z1, · · · , Zn)T .
3. Show that the conditional pdf of Zi given Yi = yi and θ is
P (Zi = 1|yi, θ) =
p exp{− 1
2σ2
(yi − µ1)2}
p exp{− 1
2σ2
(yi − µ1)2}+ (1− p) exp{− 12σ2 (yi − µ2)
2}
and P (Zi = 0|yi, θ) = 1− P (Zi = 1|yi, θ). Then find E(Zi|yi, θ) and Var(Zi|yi, θ).
2
MAST90083 Computational Statistics & Data Mining EM Algorithm
4. Use the results in (3) to derive and simplify Q(θ|θ(k)) = E[lnL(θ|yn,Zn)|yn, θ(k)] that is
to be used in the E-step. Here θ(k) = (p(k), µ
(k)
1 , µ
(k)
2 , σ
(k)) is the kth update of θ obtained
from the EM algorithm.
5. Denote z
(k)
i = E(Zi|yi, θ
(k)). By implementing the M-step show that the (k+1)th update
of θ is
p(k+1) =
1
n
n∑
i=1
z
(k)
i , µ
(k+1)
1 =
∑n
i=1 z
(k)
i yi∑n
i=1 z
(k)
i
, µ
(k+1)
2 =
∑n
i=1(1− z
(k)
i )yi∑n
i=1(1− z
(k)
i )
σ(k+1) =
√√√√ 1
n
n∑
i=1
[
z
(k)
i (yi − µ
(k+1)
1 )
2 + (1− z(k)i )(yi − µ
(k+1)
2 )
2
]
z
(k+1)
i =
p(k+1) exp{− (yi−µ
(k+1)
1 )
2
2(σ(k+1))2
}
p(k+1) exp{− (yi−µ
(k+1)
1 )
2
2(σ(k+1))2
}+ (1− p(k+1)) exp{− (yi−µ
(k+1)
2 )
2
2(σ(k+1))2
}
6. The following R function implements the EM algorithm for the problem. Test the
function using a simulated sample of n = 400 observations where the true value of θ is
(p = 0.25, µ1 = 5, µ2 = 10, σ = 1.5). Use different initial values, e.g.
θ̂ = (0.4,min(y),max(y), sd(y)); or (0.4,max(y),min(y), sd(y));
or (0.9,min(y),max(y), sd(y) + 1); etc..
set.seed(1234)
y1=rnorm(n=100, mean=5, sd=1.5)
y2=rnorm(n=300, mean=10, sd=1.5)
y=c(y1,y2)
########begin tut3em.f function
tut3em.f=function(y, theta0, iter=20){
theta=matrix(0, iter+1,4)
n=length(y)
z=matrix (0, n, iter+1) # matrix of conditional mean of Z_is
loglik=rep(-1,iter+1)
theta[1,]=theta0
tem1=theta[1,1]*exp(-0.5*(y-theta[1,2])^2/(theta[1,4])^2)
tem2=(1-theta[1,1])*exp(-0.5*(y-theta[1,3])^2/(theta[1,4])^2)
loglik[1]=-n*log(sqrt(2*pi)*theta[1,4])+sum(log(tem1+tem2))
z[,1]=tem1/(tem1+tem2)
for(k in 1:iter){
theta[k+1,1]=mean(z[,k]) #calculate p(k+1)
theta[k+1,2]=sum(z[,k]*y)/sum(z[,k]) #calculate mu1(k+1)
theta[k+1,3]=sum((1-z[,k])*y)/sum(1-z[,k]) #calculate mu2(k+1)
theta[k+1,4]=sqrt((1/n)*(sum(z[,k]*(y-theta[k+1,2])^2)+sum((1-z[,k])*(y-theta[k+1,3])^2)))
tem1=theta[k+1,1]*exp(-0.5*(y-theta[k+1,2])^2/(theta[k+1,4])^2)
tem2=(1-theta[k+1,1])*exp(-0.5*(y-theta[k+1,3])^2/(theta[k+1,4])^2)
loglik[k+1]=-n*log(sqrt(2*pi)*theta[k+1,4])+sum(log(tem1+tem2))
z[,k+1]=tem1/(tem1+tem2)
}
result=list(theta=theta, z=z, loglik=loglik)
####end tut3em.f function
em1=tut3em.f(y, theta0=c(0.4, min(y), max(y), sd(y)))
em2=tut3em.f(y, theta0=c(0.4, max(y), min(y), sd(y)))
em3=tut3em.f(y, theta0=c(0.9, min(y), max(y), sd(y)+1), iter=70)
3