CS计算机代考程序代写 algorithm Expectation Maximization Algorithm

Expectation Maximization Algorithm
CSE, UNSW
April 14, 2021
1/15
CSE, UNSW
Expectation Maximization Algorithm

Motivation
Missing data Latent variable Easier optimization
2/15
Expectation Maximization Algorithm
CSE, UNSW

Convex Function
e.g., −log(x)
An important concept in optimization / machine learning.
3/15
Expectation Maximization Algorithm
CSE, UNSW

Jensen’s Inequality
Let f be a convex function defined on an interval I. If {xi }ni=1 ∈ I and {λi }ni=1 ≥ 0 with 􏰃i λi = 1, then
􏰍n􏰎n
f 􏰓 λi xi ≤ 􏰓 λi f (xi )
i=1 i=1
Theequalityholdsiffx1 =x2 =…=xn orf islinear. Corollary: Since ln(x) is a concave function (i.e., −ln(x) is a convex function), then
􏰍n􏰎n
ln 􏰓λif(xi) ≥ 􏰓λi ln(f(xi))
i=1 i=1
In addition, the equality holds iff f (xi ) is a constant.
4/15
Expectation Maximization Algorithm
CSE, UNSW

Log Likelihood
Define log likelihood function L(θ) = ln Pr{x | θ}. For i.i.d. examples, L(θ) = 􏰃i L(i)(θ) = 􏰃i lnPr􏰇x(i) | θ􏰈.
Goal: find θ∗ that maximizes the log likelihood.
What if the model contains latent variable z = [z(i)]i (whose
value is unknown)?
(i) def 􏰱(i) 􏰲 􏰓 􏰱(i) (i) 􏰲 L(θ)=lnPrx|θ=ln z(i)Prx,z|θ
􏰓 (i) Pr􏰇x(i),z(i) | θ􏰈
=ln z(i) q(z )· q(z(i)) (†)
≥􏰓 q(z(i))·lnPr􏰇x(i),z(i) |θ􏰈 z(i) q(z(i))
If q(z(i)) = Pr􏰇z(i) | x(i),θ􏰈, then the equality holds Expectation Maximization Algorithm
5/15
CSE, UNSW

θ vs θ(old)
Given the current parameter θ(old), and let
(i)def 􏰇(i) (i) 􏰈 q(old)(z )=Pr z |x ,θ(old)
􏰍 Pr􏰇x(i),z(i) | θ􏰈􏰎 L(i)(θ) = ln 􏰓 q(old)(z(i)) q(old)(z(i))
z(i)
􏰍Pr􏰇x(i),z(i) | θ􏰈􏰎
≥ 􏰓 q(old)(z(i)) ln q(old)(z(i)) z(i)
􏰲ln􏰛Pr􏰱x(i),z(i) |θ􏰲􏰜 −􏰓Pr􏰱z(i) |x(i),θ 􏰲ln􏰛Pr􏰱z(i) |x(i),θ
z(i)
= 􏰓Pr􏰱z(i) | x(i),θ
(†)
=􏰓Pr􏰱z(i) |x(i),θ (old)
z(i)
z
(i )
constant,entropy(q)
(old) (old) 􏰲ln􏰛Pr􏰱x(i),z(i) | θ􏰲􏰜+
C
(old)
􏰧 􏰦􏰥 􏰨
􏰧􏰦􏰥􏰨
􏰲􏰜
def (i)
=Q (θ,θ(old))
6/15
Expectation Maximization Algorithm
CSE, UNSW

EM
Hence, the EM algorithm iterates the following two steps:
[E-step]: Compute the q(old)(z(i)) = Pr􏰇z(i) | x(i), θ(old)􏰈
[M-step]: Find θ that maximizes the function Q(θ,θ(old)) (see above (just sum over i).
Alternative interpretation:
(i) def 􏰓 􏰱 (i) (i) 􏰲 􏰛 􏰱 (i) (i) Q(θ,θ(old))= Prz|x,θ(old)lnPrx,z|θ
􏰲􏰜
z(i)
= Ez(i)∼q(old)(z(i))􏰴 ln Pr x , z | θ 􏰵
i.e., the expected complete log-likelihood (function) Sample z from the proposal distribution q
Then it is easy to compute the complete log-likelihood Do this for every possible z
􏰱(i) (i) 􏰲
7/15
Expectation Maximization Algorithm
CSE, UNSW

Illustration
θ E-step M-step θ E-step M-step . . . (t) q(t) log-likelihood (t+1) q(t+1) log-likelihood
z(t )
z(t +1)
8/15
CSE, UNSW
Expectation Maximization Algorithm

How EM converges
9/15
CSE, UNSW
Expectation Maximization Algorithm

Example 1: Three Coins
Given three coins: z, a, and b, with head probabilities π, α, and β, respectively.
Generative process: if toss(z) == head, return(toss(a)); else return(toss(b)).
Observed data x = [1,1,0,1,0,0,1,0,1,1]. Goal: estimate the parameters
The usual assumption: all tosses are i.i.d.
If we know { z(i) }10 i=1
Observed data:
z(i) 1011100100 coin→x(i) abaaabbabb x(i) 1101001011
πMLE = αMLE = βMLE = CSE, UNSW Expectation Maximization Algorithm
10/15

Solution /1
Problem setup!:
θ =?
Missing data (i.e., z) = ?
Complete likelihood (for a single item): Pr{xi , zi | θ} (change of notation henceforth)
The E-step: Given current θt, we can determine the distribution q
def Pr{zi = 1,xi | θt}
μi,t =Pr{zi =1|xi,θt}=
π αxi(1−α )1−xi
Pr{xi |θt}
π αxi(1−α )1−xi +(1−π )βxi(1−β )1−xi
=ttt tttttt
Numerator:
=Pr{zi =1|θt}·Pr{xi |zi =1,θt}
typical trick to write the piece-wise function for the likelihood.
Denominator: sum over zi = 1 and zi = 0.
Expectation Maximization Algorithm
11/15
CSE, UNSW

Solution /2
Compute Q(θ, θ(old)) First
ln(Pr{xi,zi |θ})=ln􏰙π[αxi(1−α)1−xi]zi ·[(1−π)βxi(1−β)1−xi]1−zi􏰚 =lnπ+zi ·(xi lnα+(1−xi)ln(1−α))+
(1−zi)·(xi lnβ+(1−xi)ln(1−β)) Then:
Q = 􏰓􏰓q(zi)ln(Pr{xi,zi | θt}) i zi
=􏰓􏰙μi,t ln(Pr{xi,zi =1|θt})+(1−μi,t)ln(Pr{xi,zi =0|θt})􏰚 i
The M-step:
∂Q(θ | θt) = 0 =⇒ πt+1 = 1 􏰓μi,t
∂π n
∂Q(θ | θt) ∂α ∂Q(θ | θt) ∂β
=0=⇒αt+1= 􏰃iμi,t 􏰃i(1−μi,t)xi
i
􏰃i μi,txi
=0=⇒βt+1= 􏰃i(1−μi,t) Expectation Maximization Algorithm
12/15
CSE, UNSW

Understanding the Equations
∂Q(θ | θt) = 0 =⇒ πt+1 = 1 􏰓μi,t
∂π n
∂Q(θ | θt) ∂α ∂Q(θ | θt) ∂β
Consider the example on the question page. In that example, we can deem that μi,t is a binary variable, i.e., μi,t = 1 iff coin z(i) = head, or equivalent, coin a is chosen to determine x(i). Then one can easily verify that the MLE estimation is the same as the update rules in EM. Therefore, these rules can be deemed as a “soft” version of MLE: informally, each x(i) has μi,t contribution to the parameter estimation of coin a, and (1 − μi ,t ) contribution to the parameter estimation of coin b.
=0=⇒αt+1= 􏰃iμi,t 􏰃i(1−μi,t)xi
=0=⇒βt+1= 􏰃i(1−μi,t)
i
􏰃i μi,txi
13/15
CSE, UNSW
Expectation Maximization Algorithm

Concrete Example
μi,t =p(zi =1|xi =1, θt ) 􏰧􏰦􏰥􏰨
π=0.6,α=0.1,β=0.8 = p(zi =1,xi =1|θt)
= =
p(xi =1|θt)
p(zi =1,xi =1|θt)
p(zi =1,xi =1|θt)+p(zi =0,xi =1|θt) 0.6·0.1 = 0.16
0.6·0.1+0.4·0.8
Similarly,
p(zi =1|xi =0,θt)=
0.6·0.9 =0.82 0.6·0.9+0.6·0.2
14/15
CSE, UNSW
Expectation Maximization Algorithm

Concrete Example /2
How many different scenarios?
zi xi
00 01 10 11
Observations: 6 1’s and 4 0’s.
p(zi |xi,θt)
0.18
0.84
0.82 0.16
πt+1 = 1􏰓μi,t = 0.16·6+0.82·4 =0.424 n 10
i
􏰃i μi,txi 0.16·6
αt+1= 􏰃μ = 4.24 =0.226
i i,t
􏰃i(1 − μi,t)xi 0.84 · 6
βt+1 = 􏰃i(1−μi,t) = 0.84·6+0.18·4 =0.875 Expectation Maximization Algorithm
15/15
CSE, UNSW