Exercises for the course
Machine Learning 1
Winter semester 2021/22
fu ̈r Softwaretechnik und theoretische ̈at IV, ̈at Berlin Prof. Dr. Klaus- ̈ller Email:
Copyright By PowCoder代写 加微信 powcoder
Exercise Sheet 3
Exercise 1: (10 + 10 P)
Let x1, . . . , xN ∈ Rd be a dataset of N data points. We consider the objective function
J(θ) = ||θ − xk||2
to be minimized with respect to the parameter θ ∈ Rd. In absence of constraints, the parameter θ that minimizes this objective is given by the empirical mean m = N1 Nk=1 xk. However, this is generally not the case when the parameter θ is constrained.
(a) Using the method of Lagrange multipliers, find the parameter θ that minimizes J(θ) subject to the constraint θ⊤b = 0, with b some unit vector in Rd. Give a geometrical interpretation to your solution.
(b) Using the same method, find the parameter θ that minimizes J(θ) subject to ||θ − c||2 = 1, where c is a vector in Rd different from m. Give a geometrical interpretation to your solution.
Exercise 2: Principal Component Analysis (10 + 10 P)
We consider a dataset x1, . . . , xN ∈ Rd. Principal component analysis searches for a unit vector u ∈ Rd such that projecting the data on that vector produces a distribution with maximum variance. Such vector can be found by solving the optimization problem:
argmax u⊤xk− uNN
(a) Show that the problem above can be rewritten as
arg max u⊤Su u
1 N 1 N
with ∥u∥2=1
where S = (xk − m)(xk − m)⊤ is the scatter matrix, and m = N xk is the empirical mean.
(b) Show using the method of Lagrange multipliers that the problem above can be reformulated as solving the
eigenvalue problem
and retaining the eigenvector u associated to the highest eigenvalue λ.
Exercise 3: Bounds on Eigenvalues (5 + 5 + 5 + 5 P)
Let λ1 denote the largest eigenvalue of the matrix S. The eigenvalue λ1 quantifies the variance of the data when projected on the first principal component. Because its computation can be expensive, we study how the latter can be bounded with the diagonal elements of the matrix S.
(a) Show that di=1 Sii is an upper bound to the eigenvalue λ1.
(b) State the conditions on the data for which the upper bound is tight.
(c) Show that maxdi=1 Sii is a lower bound to the eigenvalue λ1.
(d) State the conditions on the data for which the lower bound is tight.
Exercise 4: Iterative PCA (10 P)
When performing principal component analysis, computing the full eigendecomposition of the scatter matrix S is typically slow, and we are often only interested in the first principal components. An efficient procedure to find the first principal component is power iteration. It starts with a random unit vector w(0) ∈ Rd, and iteratively applies the parameter update
w(t+1) = Sw(t) ∥Sw(t)∥
until some convergence criterion is met. Here, we would like to show the exponential convergence of power
iteration. For this, we look at the error terms
and observe that they should all converge to zero as w approaches the eigenvector u1 and becomes orthogonal
to other eigenvectors.
(a) Show that Ek(w(T)) = |λk/λ1|T ·Ek(w(0)), i.e. the convergence of the algorithm is exponential with the number of time steps T . (Hint: to show this, it is useful to rewrite the scatter matrix in terms of eigenvalues and eigenvectors, i.e. S = di=1 uiu⊤i λi.)
Exercise 5: Programming (30 P)
Download the programming files on ISIS and follow the instructions.
w ⊤ u k
Ek(w) = with k = 2,…,d,
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com