Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 1
Exercise 1: Estimating the Bayes Error (10 + 10 + 10 P)
The Bayes decision rule for the two classes classification problem results in the Bayes error
P (error) =
P (error|x) p(x) dx,
whereP(error|x)=min[P(ω1|x),P(ω2|x)]istheprobabilityoferrorforaparticularinputx. Interestingly, while class posteriors P(ω1|x) and P(ω2|x) can often be expressed analytically and are integrable, the error function has discontinuities that prevent its analytical integration, and therefore, direct computation of the Bayes error.
(a) Show that the full error can be upper-bounded as follows:
2
P (error) ≤ 1 + 1 p(x) dx.
P (ω1 |x) P (ω2 |x)
Note that the integrand is now continuous and corresponds to the harmonic mean of class posteriors weighted
by p(x).
(b) Show using this result that for the univariate probability distributions
π−1 π−1 p(x|ω1) = 1+(x−μ)2 and p(x|ω2) = 1+(x+μ)2,
the Bayes error can be upper-bounded by:
P (error) ≤ 1 + 4μ2P (ω1)P (ω2)
2 P (ω1)P (ω2) (Hint:youcanusetheidentity 1 dx=√ 2π forb2<4ac.)
ax2 +bx+c 4ac−b2
(c) Explain how you would estimate the error if there was no upper-bounds that are both tight and analytically
integrable. Discuss following two cases: (1) the data is low-dimensional and (2) the data is high-dimensional.
Exercise 2: Bayes Decision Boundaries (15 + 15 P)
One might speculate that, in some cases, the generated data p(x|ω1) and p(x|ω2) is of no use to improve the accuracy of a classifier, in which case one should only rely on prior class probabilities P(ω1) and P(ω2) assumed here to be strictly positive.
For the first part of this exercise, we assume that the data for each class is generated by the univariate Laplacian probability distributions:
p(x|ω1)= 1 exp−|x−μ| and p(x|ω2)= 1 exp−|x+μ|. 2σ σ 2σ σ
where μ, σ > 0.
(a) Determine for which values of P (ω1 ), P (ω2 ), μ, σ the optimal decision is to always predict the first class (i.e.
underwhichconditionsP(error|x)=P(ω2|x) ∀x∈R).
(b) Repeat the exercise for the case where the data for each class is generated by the univariate Gaussian
probability distributions:
p(x|ω1) = σ√2π exp − 2σ2 and p(x|ω2) = σ√2π exp
where σ > 0.
Exercise 3: Programming (40 P)
Download the programming files on ISIS and follow the instructions.
(x+μ)2 − 2σ2
1 (x−μ)2 1
.