CS计算机代考程序代写 data mining algorithm MAST90083 Computational Statistics & Data Mining EM Algorithm

MAST90083 Computational Statistics & Data Mining EM Algorithm

Tutorial and Practical 8: Solutions

Question 1

1. The mean of p(y) is given by

E(y) =

yp (y) dy

=


y

K∑
k=1

pkp (y|k) dy

=
K∑
k=1

pk


yp (y|k) dy

=
K∑
k=1

pkµk

2. The covariance of y is given by

cov (y) = E
(
yy>

)
− E(y)E(y)>.

The first term on the right hand side is

E(yy>) =

yy>p (y) dy

=


yy>

K∑
k=1

pkp (y|k) dy

=
K∑
k=1

pk


yy>p (y|k) dy

=
K∑
k=1

pkEk
(
yy>

)
=

K∑
k=1

pk
(
µkµ

>
k + Σk

)
where we used

Ek
(
yy>

)
= covk (y) + Ek(y)Ek(y)> = Σk + µkµ>k .

Question 2

1. Using the Bayes rule we have

p (θ|Y) ∝ p (Y|θ) p (θ)

and with the logarithm this gives

log p (θ|Y) ∝ log p (Y|θ) + log p (θ) .

1

MAST90083 Computational Statistics & Data Mining EM Algorithm

In the E-step we are interested in log p (Y,Z|θ)

log p (θ|Y) ∝ log

{∑
Z

p (Y,Z|θ)

}
+ log p (θ) (1)

= log

{[∑
Z

p (Y,Z|θ)

]
.p (θ)

}
(2)

= log

{∑
Z

p (Y,Z|θ) .p (θ)

}
. (3)

The only modification is therefore the replacement of p (Y,Z|θ) by p (Y,Z|θ) .p (θ).

2. The form of Q (θ,θi−1) is given by

Q∗
(
θ,θi−1

)
=


Z

p
(
Z|Y,θi−1

)
log [p (Y,Z|θ) .p (θ)]

=

Z

p
(
Z|Y,θi−1

)
log p (Y,Z|θ) +


Z

p
(
Z|Y,θi−1

)
log p (θ)

=

Z

p
(
Z|Y,θi−1

)
log p (Y,Z|θ) + log p (θ)


Z

p
(
Z|Y,θi−1

)
=


Z

p
(
Z|Y,θi−1

)
log p (Y,Z|θ) + log p (θ)

= Q
(
θ,θi−1

)
+ log p (θ) .

Question 3

Consider a random sample T1, · · · , Tn from a Weibull distribution which has the pdf

f(t|β) =
4

β
t3e−t

4/β, t > 0; β > 0.

Suppose we have observed T1 = y1, · · · , Tm = ym and Tm+1 > cm+1, · · · , Tn > cn, where m is
given, m < n, and y1, · · · , ym and cm+1 · · · , cn are given numerical values. This implies that T1, · · · , Tm are completely observed whereas Tm+1, · · · , Tn are partially observed in that they are right-censored. We want to use an EM algorithm to find the MLE of β. 1. Find the complete-data log-likelihood function `(β) = lnL(β). � `(β) = ln 4n − n ln β + ∑n i=1 lnT 3 i − 1 β ∑n i=1 T 4 i . 2. In the E-step, we calculate Q(β, β(k)) = E[lnL(β) | T1 = y1, · · · , Tm = ym, Tm+1 > cm+1, · · · , Tn > cn; β(k)]

2

MAST90083 Computational Statistics & Data Mining EM Algorithm

where β(k) is the kth-step estimate of β. Show that

Q(β, β(k)) = ln 4n − n ln β +
m∑
i=1

ln y3i +
n∑

i=m+1

E[lnT 3i |Ti>ci, β
(k)]


1

β

m∑
i=1

y4i −
1

β

n∑
i=m+1

c4i −
(n−m)β(k)

β
.

(Note: The result E(T 4|T > c) = c4 + β for the Weibull random variable T considered
here can be used without proof.)

� Q(β, β(k)) = ln 4n − n ln β +
∑m

i=1E(lnT
3
i |Ti = yi, β = β(k))

+
∑n

i=m+1E(lnT
3
i |Ti>ci, β = β(k))−

1
β

∑m
i=1E(T

4
i |Ti = yi, β = β(k))

− 1
β

∑n
i=m+1E(T

4
i |Ti > ci, β = β(k))

= ln 4n − n ln β +
∑m

i=1 ln y
3
i +

∑n
i=m+1E[lnT

3
i |Ti>ci, β(k)]

− 1
β

∑m
i=1 y

4
i −

1
β

∑n
i=m+1(c

4
i + β

(k))

= ln 4n − n ln β +
∑m

i=1ln y
3
i +

∑n
i=m+1E[lnT

3
i |Ti>ci, β(k)]

− 1
β

∑m
i=1 y

4
i −

1
β

∑n
i=m+1 c

4
i −

(n−m)β(k)
β

.

3. In the M-step, we maximise Q(β, β(k)) with respect to β to find an update β(k+1) from
β(k). Show that

β(k+1) =
1

n

[
m∑
i=1

y4i +
n∑

i=m+1

c4i + (n−m)β
(k)

]
.


∂Q(β,β(k))

∂β
= −n

β
+ 1

β2

∑m
i=1 y

4
i +

1
β2

∑n
i=m+1 c

4
i +

(n−m)β(k)
β2

.

� Solving
∂Q(β,β(k))

∂β
= 0, it follows that β(k+1) = 1

n

[∑m
i=1 y

4
i +

∑n
i=m+1 c

4
i + (n−m)β(k)

]
.

Question 4

Suppose Y1, · · · , Yn are to be independently observed from a two-component normal mixture
model which has the following pdf

g(yi|θ) =
p

σ


exp{−

1

2σ2
(yi − µ1)2}+

1− p
σ


exp{−

1

2σ2
(yi − µ2)2}; i = 1, · · · , n

where θ = (p, µ1, µ2, σ) are unknown parameters with 0 ≤ p ≤ 1 and σ > 0. We want to use
the EM algorithm to find the MLE θ̂ of θ. In order to do this, we introduce an unobserved
indicator variable Zi for each i = 1, · · · , n, with P (Zi = 1) = p and P (Zi = 0) = 1−p. Then we
know the conditional pdf of Yi given Zi is (Yi|Zi = 1)

d
= N(µ1, σ

2) and (Yi|Zi = 0)
d
= N(µ2, σ

2).

1. Write down the observed-data log-likelihood function lnL(θ|yn) where yn = (y1, · · · , yn)T
are observations of Y1, · · · , Yn.

3

MAST90083 Computational Statistics & Data Mining EM Algorithm

� The observed-data log-likelihood is

lnL(θ|yn) = ln
n∏
i=1

g(yi|θ) =
n∑
i=1

ln g(yi|θ) =

=
n∑
i=1

ln

[
p

σ


exp{−

1

2σ2
(yi − µ1)2}+

1− p
σ


exp{−

1

2σ2
(yi − µ2)2}

]
.

2. Write down the complete-data log-likelihood function lnL(θ|yn,Zn) where Zn = (Z1, · · · , Zn)T .

� The joint pdf of (Yi, Zi) is

f(yi, zi|θ) = f(yi|zi, θ)f(zi) =
[

p

σ


e
− 1

2σ2
(yi−µ1)2

]zi [ 1− p
σ


e
− 1

2σ2
(yi−µ2)2

]1−zi
.

� Hence the complete-data log-likelihood is

lnL(θ|yn,Zn) = ln
n∏
i=1

f(yi, Zi|θ) =
n∑
i=1

ln f(yi, Zi|θ)

=
n∑
i=1

ln

[
p

σ


e
− 1

2σ2
(yi−µ1)2

]zi [ 1− p
σ


e
− 1

2σ2
(yi−µ2)2

]1−zi
= −n ln


2π − n lnσ +

n∑
i=1

[Zi ln p+ (1− Zi) ln(1− p)]


1

2σ2

n∑
i=1

[
Zi(yi − µ1)2 + (1− Zi)(yi − µ2)2

]
.

3. Show that the conditional pdf of Zi given Yi = yi and θ is

P (Zi = 1|yi, θ) =
p exp{− 1

2σ2
(yi − µ1)2}

p exp{− 1
2σ2

(yi − µ1)2}+ (1− p) exp{− 12σ2 (yi − µ2)
2}

and P (Zi = 0|yi, θ) = 1− P (Zi = 1|yi, θ). Then find E(Zi|yi, θ) and Var(Zi|yi, θ).

� P (Zi = 1|yi, θ) =
f(yi|zi = 1, θ)f(zi = 1)

g(yi|θ)

=

1
σ


e
− 1

2σ2
(yi−µ1)2 · p

1
σ


e
− 1

2σ2
(yi−µ1)2 · p+ 1

σ


e
− 1

2σ2
(yi−µ2)2 · (1− p)

=
pe
− 1

2σ2
(yi−µ1)2

pe
− 1

2σ2
(yi−µ1)2 + (1− p)e−

1
2σ2

(yi−µ2)2
.

� Zi has only two possible values 0 and 1.
Hence P (Zi = 0|yi, θ) = 1− P (Zi = 1|yi, θ).

� E(Zi|yi, θ) = 1 · P (Zi = 1|yi, θ) + 0 · P (Zi = 0|yi, θ) = P (Zi = 1|yi, θ).
� Var(Zi|yi, θ) = E(Z2i |yi, θ)− [E(Zi|yi, θ)]2 = P (Zi = 1|yi, θ)P (Zi = 0|yi, θ).

4

MAST90083 Computational Statistics & Data Mining EM Algorithm

4. Use the results in (3) to derive and simplify Q(θ|θ(k)) = E[lnL(θ|yn,Zn)|yn, θ(k)] that is
to be used in the E-step. Here θ(k) = (p(k), µ

(k)
1 , µ

(k)
2 , σ

(k)) is the kth update of θ obtained
from the EM algorithm.

� Using results of (b) and (c),

Q(θ|θ(k)) = E[lnL(θ|yn,Zn)|yn, θ
(k)]

= −n ln

2π − n lnσ +
n∑
i=1

[
z
(k)
i ln p+ (1− z

(k)
i ) ln(1− p)

]

1

2σ2

n∑
i=1

[
z
(k)
i (yi − µ1)

2 + (1− z(k)i )(yi − µ2)
2
]
.

5. Denote z
(k)
i = E(Zi|yi, θ

(k)). By implementing the M-step show that the (k+1)th update
of θ is

p(k+1) =
1

n

n∑
i=1

z
(k)
i , µ

(k+1)
1 =

∑n
i=1 z

(k)
i yi∑n

i=1 z
(k)
i

, µ
(k+1)
2 =

∑n
i=1(1− z

(k)
i )yi∑n

i=1(1− z
(k)
i )

σ(k+1) =

√√√√ 1
n

n∑
i=1

[
z
(k)
i (yi − µ

(k+1)
1 )

2 + (1− z(k)i )(yi − µ
(k+1)
2 )

2
]

z
(k+1)
i =

p(k+1) exp{− (yi−µ
(k+1)
1 )

2

2(σ(k+1))2
}

p(k+1) exp{− (yi−µ
(k+1)
1 )

2

2(σ(k+1))2
}+ (1− p(k+1)) exp{− (yi−µ

(k+1)
2 )

2

2(σ(k+1))2
}

� It can be seen from the result of (4) that

∂Q(θ|θ(k))
∂p

=

∑n
i=1 z

(k)
i

p

∑n

i=1(1− z
(k)
i )

(1− p)
,

∂Q(θ|θ(k))
∂µ1

=
1

σ2

n∑
i=1

z
(k)
i (yi−µ1),

∂Q(θ|θ(k))
∂µ2

=
1

σ2

n∑
i=1

(1−z(k)i )(yi−µ2),

∂Q(θ|θ(k))
∂σ

= −
n

σ
+

1

σ3

n∑
i=1

[
z
(k)
i (yi − µ1)

2 + (1− z(k)i )(yi − µ2)
2
]
.

� The updates can be shown by setting these derivatives to zero, solving the results
equations w.r.t. p, µ1, µ2 and σ, and then using z

(k+1)
i = E(Zi|yi, θ

(k+1)) and the
result of (c).

6. The following R function implements the EM algorithm for the problem. Test the
function using a simulated sample of n = 400 observations where the true value of θ is
(p = 0.25, µ1 = 5, µ2 = 10, σ = 1.5). Use different initial values, e.g.
θ̂ = (0.4,min(y),max(y), sd(y)); or (0.4,max(y),min(y), sd(y));
or (0.9,min(y),max(y), sd(y) + 1); etc..

5

MAST90083 Computational Statistics & Data Mining EM Algorithm

em1=tut3em.f(y, theta0=c(0.4, min(y), max(y), sd(y)))

em2=tut3em.f(y, theta0=c(0.4, max(y), min(y), sd(y)))

em3=tut3em.f(y, theta0=c(0.9, min(y), max(y), sd(y)+1), iter=70)

> attributes(em3)

$names

[1] “theta” “z” “loglik”

> em3$theta[71,]

[1] 0.2493479 4.6392209 10.1309750 1.4036674

> p l o t (em3$ l o g l i k [=1] , type= ‘b ‘ )


● ●

● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●
● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 10 20 30 40 50 60 70


9

8
0


9

6
0


9

4
0


9

2
0

Iterations

O
b

s
e

r
v
e

d
d

a
ta

l
o

g

li
k
e

li
h

o
o

d

6