CS计算机代考程序代写 Exercises for the course

Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 12
Exercise 1: EM Procedure with Discrete Distributions (15 + 20 + 15 P)
Consider a latent variable model composed of one Bernoulli and two Binomial distributions: The first distribution p(z|θ) produces the latent state z ∈ {H, T } which can be interpreted as the head/tail outcome of a coin toss, and two distributions p(x | z = H,θ), p(x | z = T,θ) produce the observed data x ∈ {0,1,…,m} conditioned on outcome of the coin toss z. The probability distributions are defined as:
p(z = H | θ) = p(z = T | θ) =
p(x|z=H,θ)=
p(x|z=T,θ)=
The variable θ = (λ, a, b) contains the parameters of
λ
1 − λ
􏰀m􏰁 x m−x x a (1−a)
􏰀m􏰁 x m−x x b(1−b)
the model and these parameters need to be learned.
We draw from the model N times independently, and thus generate a dataset D = (x1, . . . , xN ). The goal is now to estimate the parameters θ = (λ, a, b) that best explain the observed data. This can be done using expectation-maximization. Assuming all distributions of the latent variable model are independent, the data log-likelihood can be written as:
p(D|θ) = 􏰉 􏰃 p(z|θ)p(x|z, θ) x∈D z∈{H,T }
(a) Show that p(D|θ) can be lower-bounded as:
log p(D|θ) ≥ J(θ) with J(θ) = 􏰃 􏰆G(x) + 􏰃 q(z|x)􏰤 log p(z|θ) + log p(x|z, θ)􏰥􏰇
x∈D z∈{H,T }
where q(z|x) can be any probability distribution conditioned on x, and where G(x) is a quantity that does
not depend on θ.
(b) Assuming some distribution q(z|x), apply the maximization step of the EM procedure, in particular, show
that the parameter θ that maximizes the objective J(θ) is given by:
1 􏰃 􏰂x∈Dq(z=H|x)·x 􏰂x∈Dq(z=T|x)·x
q(z = H|x) a = 􏰂x∈D q(z = H|x) · m b = 􏰂x∈D q(z = T|x) · m the current set of parameters θ, and show that it is given by:
λ = N
(c) Apply the expectation step of the EM procedure that computes the new distribution q(z|x) = p(z|x, θ) for
x∈D
with γ a normalization constant set to ensure q(z = H|x) + q(z = T|x) = 1.
Exercise 2: Programming (50 P)
Download the programming files on ISIS and follow the instructions.
q(z = H|x) = γ · ax(1 − a)m−x · λ
q(z = T|x) = γ · bx(1 − b)m−x · (1 − λ)