Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 10
We consider a class optimization problems of the type:
min J(θ) s.t. ∀mi=1 : gi(θ) = 0 and ∀li=1 : hi(θ) ≤ 0
θ
For this class of problem, we can build the Lagrangian:
ml
L(θ, β, λ) = J(θ) + βigi(θ) + λihi(θ).
i=1 i=1
where (βi)i and (λi)i are the dual variables. According to the Karush-Kuhn-Tucker (KKT) conditions, it is necessary for a solution of this optimization problem that the following constraints are satisfied (in addition to the original constraints of the optimization problem):
∂L = 0 ∂θ
∀li=1 : λi ≥ 0 ∀li=1 : λihi(θ) = 0
(stationarity)
(dual feasibility) (complementary slackness)
We will make use of these conditions to derive the dual form of the kernel ridge regression problem.
Exercise 1: Kernel Ridge Regression with Lagrange Multipliers (10 + 20 + 10 + 10 P)
Let x1,…,xN ∈ Rd be a dataset with labels y1,…,yN ∈ R. Consider the regression model f(x) = w⊤φ(x) where φ: Rd → Rh is a feature map and w is obtained by solving the constrained optimization problem
min N 1ξi2 s.t. ∀Ni=1 : ξi = w⊤φ(xi) − yi and 1∥w∥2 ≤ C. ξ,w i=12 2
where equality constraints define the errors of the model, where the objective function penalizes these errors, and where the inequality constraint imposes a regularization on the parameters of the model.
(a) Construct the Lagrangian and state the KKT conditions for this problem (Hint: rewrite the equality con- straint as ξi − w⊤φ(xi) + yi = 0.)
(b) Show that the solution of the kernel regression problem above, expressed in terms of the dual variables (βi)i, and λ is given by:
β = (K + λI)−1λy
(c) Express the prediction f(x) = w⊤φ(x) in terms of the parameters of the dual.
(d) Explain how the new parameter λ can be related to the parameter C of the original formulation.
Exercise 2: Programming (50 P)
Download the programming files on ISIS and follow the instructions.
where K is the kernel Gram matrix.