Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 2
Exercise 1: Maximum-Likelihood Estimation (5 + 5 + 5 + 5 P)
We consider the problem of estimating using the maximum-likelihood approach the parameters λ, η > 0 of the probability distribution:
p(x, y) = ληe−λx−ηy
supported on R2+. We consider a dataset D = ((x1,y1),…,(xN,yN)) composed of N independent draws
from this distribution.
(a) Show that x and y are independent.
(b) Derive a maximum likelihood estimator of the parameter λ based on D.
(c) Derive a maximum likelihood estimator of the parameter λ based on D under the constraint η = 1/λ.
(d) Derive a maximum likelihood estimator of the parameter λ based on D under the constraint η = 1 − λ.
Exercise 2: Maximum Likelihood vs. Bayes (5 + 10 + 15 P)
An unfair coin is tossed seven times and the event (head or tail) is recorded at each iteration. The observed sequence of events is
D = (x1, x2, . . . , x7) = (head, head, tail, tail, head, head, head).
We assume that all tosses x1, x2, . . . have been generated independently following the Bernoulli probability
distribution
θ if x=head P(x|θ)= 1−θ if x=tail,
where θ ∈ [0, 1] is an unknown parameter.
(a) State the likelihood function P(D|θ), that depends on the parameter θ.
(b) Compute the maximum likelihood solution θˆ, and evaluate for this parameter the probability that the next two tosses are “head”, that is, evaluate P (x8 = head , x9 = head | θˆ).
(c) We now adopt a Bayesian view on this problem, where we assume a prior distribution for the parameter θ
defined as:
1 if 0≤θ≤1 p(θ)= 0 else.
Compute the posterior distribution p(θ|D), and evaluate the probability that the next two tosses are head, that is,
P(x8 = head, x9 = head | θ)p(θ|D)dθ. Exercise 3: Convergence of Bayes Parameter Estimation (5 + 5 P)
We consider Section 3.4.1 of Duda et al., where the data is generated according to the univariate probability density p(x|μ) ∼ N(μ,σ2), where σ2 is known and where μ is unknown with prior distribution p(μ) ∼ N(μ0,σ02). Having sampled a dataset D from the data-generating distribution, the posterior probability distribution over the unknown parameter μ becomes p(μ|D) ∼ N(μn,σn2), where
1 n 1 μn n μ0 1n
σ n2 = σ 2 + σ 02 σ n2 = σ 2 μˆ n + σ 02 μˆ n = n
x k .
k=1
(a) Show that the variance of the posterior can be upper-bounded as σn2 ≤ min(σ2/n , σ02), that is, the variance of the posterior is contained both by the uncertainty of the data mean and of the prior.
(b) Show that the mean of the posterior can be lower- and upper-bounded as min(μˆn, μ0) ≤ μn ≤ max(μˆn, μ0), that is, the mean of the posterior distribution lies somewhere on the segment between the mean of the prior distribution and the sample mean.
Exercise 4: Programming (40 P)
Download the programming files on ISIS and follow the instructions.