Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 11
Exercise 1: Designing a Neural Network (20 P)
We would like to implement a neural network that classifies data points in R2 according to decision boundary given in the figure below.
class B
We consider as an elementary computation the threshold neuron whose relation between inputs (ai)i and output aj is given by
zj =iaiwij +bj aj =1zj>0.
(a) Design at hand a neural network that takes x1 and x2 as input and produces the output “1” if the input belongs to class A, and “0” if the input belongs to class B. Draw the neural network model and write down the weights wij and bias bj of each neuron.
Exercise 2: Backward Propagation (5 + 15 P)
We consider a neural network that takes two inputs x1 and x2 and produces an output y based on the following set of computations:
z3 =x1·w13+x2·w23 a3 = tanh(z3)
z4 =x1 ·w14 +x2 ·w24 a4 = tanh(z4)
z5 =a3·w35+a4·w45 a5 = tanh(z5)
z6 =a3 ·w36 +a4 ·w46 a6 = tanh(z6)
y=a5+a6
(a) Draw the neural network graph associated to this set of computations.
(b) Write the set of backward computations that leads to the evaluation of the partial derivative ∂y/∂w13. Your
answer should avoid redundant computations. Hint: tanh′(t) = 1 − (tanh(t))2. Exercise 3: Neural Network Optimization (10 + 10 + 10 P)
Consider the one-layer neural network
f(x) = w⊤x
applied to data points x ∈ Rd, and where w ∈ Rd is the parameter of the model. We would like to optimize
the mean square error objective:
1⊤ 2 J(w)=Epˆ 2(w x−t) ,
where the expectation is computed over an empirical approximation pˆ of the true joint distribution p(x, t). The ground truth is known to be of type: t|x = v⊤x + ε, with the parameter v unknown, and where ε is some small i.i.d. Gaussian noise. The input data follows the distribution x ∼ N(μ,σ2I) where μ and σ2 are the mean and variance.
class A
(a) Compute the Hessian of the objective function J at the current location w in the parameter space, and as a function of the parameters μ and σ of the data.
(b) Show that the condition number of the Hessian is given by: λ
(c) Explain for this particular problem what would be the advantages and disadvantages of centering the data before training. Your answer should include the following aspects: (1) condition number and speed of convergence, (2) ability to reach a low prediction error.
Exercise 4: Programming (30 P)
Download the programming files on ISIS and follow the instructions.
λ1 ∥μ∥2
d
= 1 + σ2 .