程序代写CS代考 data mining algorithm MAST90083 Computational Statistics & Data Mining EM Algorithm

MAST90083 Computational Statistics & Data Mining EM Algorithm

Tutorial & Practical 8: EM Algorithm

Question 1

Consider a mixture distribution of the form

p (y) =
K∑
k=1

pkp (y|k)

where the elements of y could be discrete or continuous or a combination of these. Denote the
mean and the covariance of p (y|k) by µk and Σk respectively.

1. Show that the mean of the mixture is given by

E [y] =
K∑
k=1

pkµk

2. Show that the covariance of the mixture is given by

Cov [y] =
K∑
k=1

pk
{
Σk + µkµ

>
k

}
− E [y]E [y]>

Question 2

Assume we wish to use the EM algorithm to maximize the posterior distribution over pa-
rameters p (θ|Y) for a model containing latent variables Z and where Y is the observed data
set.

1. Describe the modification of the E-step compared to the maximum likelihood case.

2. Show that the quantity to be maximized in the M-step is given by

Q
(
θ,θi−1

)
+ log p (θ)

where
Q
(
θ,θi−1

)
=

z

log p (Y,Z|θ) p
(
Z|Y,θi−1

)
Question 3

Consider a random sample T1, · · · , Tn from a Weibull distribution which has the pdf

f(t|β) =
4

β
t3e−t

4/β, t > 0; β > 0.

Suppose we have observed T1 = y1, · · · , Tm = ym and Tm+1 > cm+1, · · · , Tn > cn, where m is
given, m < n, and y1, · · · , ym and cm+1 · · · , cn are given numerical values. This implies that T1, · · · , Tm are completely observed whereas Tm+1, · · · , Tn are partially observed in that they are right-censored. We want to use an EM algorithm to find the MLE of β. 1 MAST90083 Computational Statistics & Data Mining EM Algorithm 1. Find the complete-data log-likelihood function `(β) = lnL(β). 2. In the E-step, we calculate Q(β, β(k)) = E[lnL(β) | T1 = y1, · · · , Tm = ym, Tm+1 > cm+1, · · · , Tn > cn; β(k)]

where β(k) is the kth-step estimate of β. Show that

Q(β, β(k)) = ln 4n − n ln β +
m∑
i=1

ln y3i +
n∑

i=m+1

E[lnT 3i |Ti>ci, β
(k)]


1

β

m∑
i=1

y4i −
1

β

n∑
i=m+1

c4i −
(n−m)β(k)

β
.

(Note: The result E(T 4|T > c) = c4 + β for the Weibull random variable T considered
here can be used without proof.)

3. In the M-step, we maximise Q(β, β(k)) with respect to β to find an update β(k+1) from
β(k). Show that

β(k+1) =
1

n

[
m∑
i=1

y4i +
n∑

i=m+1

c4i + (n−m)β
(k)

]
.

Question 4

Suppose Y1, · · · , Yn are to be independently observed from a two-component normal mixture
model which has the following pdf

g(yi|θ) =
p

σ


exp{−

1

2σ2
(yi − µ1)2}+

1− p
σ


exp{−

1

2σ2
(yi − µ2)2}; i = 1, · · · , n

where θ = (p, µ1, µ2, σ) are unknown parameters with 0 ≤ p ≤ 1 and σ > 0. We want to use
the EM algorithm to find the MLE θ̂ of θ. In order to do this, we introduce an unobserved
indicator variable Zi for each i = 1, · · · , n, with P (Zi = 1) = p and P (Zi = 0) = 1−p. Then we
know the conditional pdf of Yi given Zi is (Yi|Zi = 1)

d
= N(µ1, σ

2) and (Yi|Zi = 0)
d
= N(µ2, σ

2).

1. Write down the observed-data log-likelihood function lnL(θ|yn) where yn = (y1, · · · , yn)T
are observations of Y1, · · · , Yn.

2. Write down the complete-data log-likelihood function lnL(θ|yn,Zn) where Zn = (Z1, · · · , Zn)T .

3. Show that the conditional pdf of Zi given Yi = yi and θ is

P (Zi = 1|yi, θ) =
p exp{− 1

2σ2
(yi − µ1)2}

p exp{− 1
2σ2

(yi − µ1)2}+ (1− p) exp{− 12σ2 (yi − µ2)
2}

and P (Zi = 0|yi, θ) = 1− P (Zi = 1|yi, θ). Then find E(Zi|yi, θ) and Var(Zi|yi, θ).

2

MAST90083 Computational Statistics & Data Mining EM Algorithm

4. Use the results in (3) to derive and simplify Q(θ|θ(k)) = E[lnL(θ|yn,Zn)|yn, θ(k)] that is
to be used in the E-step. Here θ(k) = (p(k), µ

(k)
1 , µ

(k)
2 , σ

(k)) is the kth update of θ obtained
from the EM algorithm.

5. Denote z
(k)
i = E(Zi|yi, θ

(k)). By implementing the M-step show that the (k+1)th update
of θ is

p(k+1) =
1

n

n∑
i=1

z
(k)
i , µ

(k+1)
1 =

∑n
i=1 z

(k)
i yi∑n

i=1 z
(k)
i

, µ
(k+1)
2 =

∑n
i=1(1− z

(k)
i )yi∑n

i=1(1− z
(k)
i )

σ(k+1) =

√√√√ 1
n

n∑
i=1

[
z
(k)
i (yi − µ

(k+1)
1 )

2 + (1− z(k)i )(yi − µ
(k+1)
2 )

2
]

z
(k+1)
i =

p(k+1) exp{− (yi−µ
(k+1)
1 )

2

2(σ(k+1))2
}

p(k+1) exp{− (yi−µ
(k+1)
1 )

2

2(σ(k+1))2
}+ (1− p(k+1)) exp{− (yi−µ

(k+1)
2 )

2

2(σ(k+1))2
}

6. The following R function implements the EM algorithm for the problem. Test the
function using a simulated sample of n = 400 observations where the true value of θ is
(p = 0.25, µ1 = 5, µ2 = 10, σ = 1.5). Use different initial values, e.g.
θ̂ = (0.4,min(y),max(y), sd(y)); or (0.4,max(y),min(y), sd(y));
or (0.9,min(y),max(y), sd(y) + 1); etc..

set.seed(1234)

y1=rnorm(n=100, mean=5, sd=1.5)

y2=rnorm(n=300, mean=10, sd=1.5)

y=c(y1,y2)

########begin tut3em.f function

tut3em.f=function(y, theta0, iter=20){

theta=matrix(0, iter+1,4)

n=length(y)

z=matrix (0, n, iter+1) # matrix of conditional mean of Z_is

loglik=rep(-1,iter+1)

theta[1,]=theta0

tem1=theta[1,1]*exp(-0.5*(y-theta[1,2])^2/(theta[1,4])^2)

tem2=(1-theta[1,1])*exp(-0.5*(y-theta[1,3])^2/(theta[1,4])^2)

loglik[1]=-n*log(sqrt(2*pi)*theta[1,4])+sum(log(tem1+tem2))

z[,1]=tem1/(tem1+tem2)

for(k in 1:iter){

theta[k+1,1]=mean(z[,k]) #calculate p(k+1)

theta[k+1,2]=sum(z[,k]*y)/sum(z[,k]) #calculate mu1(k+1)

theta[k+1,3]=sum((1-z[,k])*y)/sum(1-z[,k]) #calculate mu2(k+1)

theta[k+1,4]=sqrt((1/n)*(sum(z[,k]*(y-theta[k+1,2])^2)+sum((1-z[,k])*(y-theta[k+1,3])^2)))

tem1=theta[k+1,1]*exp(-0.5*(y-theta[k+1,2])^2/(theta[k+1,4])^2)

tem2=(1-theta[k+1,1])*exp(-0.5*(y-theta[k+1,3])^2/(theta[k+1,4])^2)

loglik[k+1]=-n*log(sqrt(2*pi)*theta[k+1,4])+sum(log(tem1+tem2))

z[,k+1]=tem1/(tem1+tem2)

}

result=list(theta=theta, z=z, loglik=loglik)

####end tut3em.f function

em1=tut3em.f(y, theta0=c(0.4, min(y), max(y), sd(y)))

em2=tut3em.f(y, theta0=c(0.4, max(y), min(y), sd(y)))

em3=tut3em.f(y, theta0=c(0.9, min(y), max(y), sd(y)+1), iter=70)

3