Machine Learning 1 TU Berlin, WiSe 2020/21
Expectation-Maximization
In this programming exercise we will apply the Expectation-Maximization method to estimate the parameters of the latent variable model presented in the theoretical part. The analysis will be performed on one dimension of the Boston Housing dataset, where input values are integers between 0 and 24.
In [1]: import sklearn,sklearn.datasets
X = sklearn.datasets.load_boston()[‘data’][:,8]
m = 24
We restate here the model presented in the theory part. The first distribution p(z|θ) produces the latent state z ∈ {H, T } which can be interpreted as the head/tail outcome of an (unfair) coin toss, and the following two distributions p(x | z = H,θ), p(x | z = T,θ) produce the observed data x ∈ {0,1,…,m} conditioned on outcome of the coin toss z. The probability distributions are defined as:
p(z = H | θ) = λ
p(z = T | θ) = 1 − λ
(m) p(x|z=H,θ)= x ax(1−a)m−x
(m) p(x|z=T,θ)= x bx(1−b)m−x
The vector θ = (λ, a, b) contains the parameters of the model and these parameters need to be learned. Computing the Probability Mass Function (15 P)
We first would like to implement a function that computes the probability mass function p(x|θ) for every value of x between 0 and m.
Task:
• Implement the function pmf(lambd,a,b,m)
In [2]: import numpy
from scipy.special import binom
def pmf(lambd,a,b,m):
# ———————————- # TODO: Replace by your code
# ———————————- import solution
ret = solution.pmf(lambd,a,b,m)
# ———————————-
return ret
Computing the Log-Likelihood (15 P)
Additionally, we would like to measure the quality of our latent variable model, specifically, the log-likelihood of the data according to the model:
1
Task:
log p(D|θ) =
∑
x∈D
log p(x|θ)
• Implement the function below that takes the data and the model parameters as input and returns the log-likelihood.
In [3]: def ll(X,lambd,a,b,m):
# ———————————- # TODO: Replace by your code
# ———————————- import solution
ret = solution.ll(X,lambd,a,b,m)
# ———————————-
return ret
The code below tests the two functions you have implemented by (1) plotting for a particular choice of parameters θ the pmf in superposition to the (normalized) data histogram, and (2) printing the log-likelihood of the data according to our model.
In [4]: import matplotlib %matplotlib inline
from matplotlib import pyplot as plt import numpy
def plot(data,pdf,ll):
f = plt.figure(figsize=(4,2)) plt.hist(data,bins=numpy.arange(m+2)-0.5,alpha=0.3,color=’g’,density=True) plt.plot(range(int(m+1)),pdf,’-o’,c=’r’)
plt.ylabel(‘p(x)’)
plt.xlabel(‘x’)
plt.title(‘log-lihelihood: %.3f’%ll)
lambd,a,b = 0.5,0.25,0.75
plot(X,pmf(lambd,a,b,m),ll(X,lambd,a,b,m))
We observe that the model is not very accurate and the log-likelihood of the data is quite low. 2
Implementing and Running the EM Algorithm (20 P)
We now would like to implement the EM algorithm that iteratively searches for optimal parameters λ, a and b. The function starts with some initial parameters that we set at hand. In each iteration, the E-step and the M-step are applied. According to our derivations in the theory part, the E-step and M-step are given by:
E-Step:
q(z = H|x) = γ · ax(1 − a)m−x · λ
q(z = T|x) = γ ·bx(1−b)m−x ·(1−λ)
with γ a normalization constant set to ensure q(z = H|x) + q(z = T |x) = 1. M-Step:
λ =
1 ∑ N x∈D
q(z = H|x)
∑ q(z=H|x)·x a = ∑x∈D
∑ q(z=T|x)·x b = ∑x∈D
x∈D q(z = H|x) · m
x∈D q(z = T|x) · m
Task:
• Implement the E-Step and the M-Step.
In [5]: def Estep(lambd,a,b,X,m):
# ———————————- # TODO: Replace by your code
# ———————————- import solution
ret = solution.Estep(lambd,a,b,X,m) # ———————————-
return ret
def Mstep(qH,qT,X,m):
# ———————————- # TODO: Replace by your code
# ———————————- import solution
ret = solution.Mstep(qH,qT,X,m)
# ———————————-
return ret
The code below runs the EM procedure for 4 iterations, starting from some intial parameters chosen at hand,
and shows the pdf and the log-likelihood at each iteration.
In [6]: lambd,a,b = 0.5,0.4,0.6
for t in range(4): plot(X,pmf(lambd,a,b,m),ll(X,lambd,a,b,m)) qH,qT = Estep(lambd,a,b,X,m)
lambd,a,b = Mstep(qH,qT,X,m)
3
4
Although the initial choice of parameters is not good, the procedure converges very quickly, and the improve- ment becomes marginal after only two iterations.
5