CS计算机代考程序代写 algorithm Exercises for the course

Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 3
Exercise 1: Lagrange Multipliers (10 + 10 P)
Let x1, . . . , xN ∈ Rd be a dataset of N data points. We consider the objective function
N
J(θ) = 􏰃 ||θ − xk||2
k=1
to be minimized with respect to the parameter θ ∈ Rd. In absence of constraints, the parameter θ that
minimizes this objective is given by the empirical mean m = 1 􏰂N xk. However, this is generally not the N k=1
case when the parameter θ is constrained.
(a) Using the method of Lagrange multipliers, find the parameter θ that minimizes J(θ) subject to the constraint
θ⊤b = 0, with b some unit vector in Rd. Give a geometrical interpretation to your solution.
(b) Using the same method, find the parameter θ that minimizes J(θ) subject to ||θ − c||2 = 1, where c is a
vector in Rd different from m. Give a geometrical interpretation to your solution. Exercise 2: Principal Component Analysis (10 + 10 P)
We consider a dataset x1, . . . , xN ∈ Rd. Principal component analysis searches for a unit vector u ∈ Rd such that projecting the data on that vector produces a distribution with maximum variance. Such vector can be found by solving the optimization problem:
argmax u⊤xk− uNN
k=1
(a) Show that the problem above can be rewritten as
arg max u⊤Su u
u⊤xl
1 􏰃N 􏰧 1 􏰆 􏰃N
􏰇 􏰨 2
with ∥u∥2=1
􏰃N
where S = (xk − m)(xk − m)⊤ is the scatter matrix, and m = N xk is the empirical mean.
k=1 k=1
(b) Show using the method of Lagrange multipliers that the problem above can be reformulated as solving the
l=1
with
∥u∥2 = 1
1 􏰃N
eigenvalue problem
and retaining the eigenvector u associated to the highest eigenvalue λ.
Exercise 3: Bounds on Eigenvalues (5 + 5 + 5 + 5 P)
Let λ1 denote the largest eigenvalue of the matrix S. The eigenvalue λ1 quantifies the variance of the data when projected on the first principal component. Because its computation can be expensive, we study how the latter can be bounded with the diagonal elements of the matrix S.
(a) Show that 􏰂di=1 Sii is an upper bound to the eigenvalue λ1.
(b) State the conditions on the data for which the upper bound is tight.
(c) Show that maxdi=1 Sii is a lower bound to the eigenvalue λ1.
(d) State the conditions on the data for which the lower bound is tight.
Su = λu

Exercise 4: Iterative PCA (10 P)
When performing principal component analysis, computing the full eigendecomposition of the scatter matrix S is typically slow, and we are often only interested in the first principal components. An efficient procedure to find the first principal component is power iteration. It starts with a random unit vector w(0) ∈ Rd, and iteratively applies the parameter update
w(t+1) = Sw(t)􏰅 ∥Sw(t)∥
until some convergence criterion is met. Here, we would like to show the exponential convergence of power
iteration. For this, we look at the error terms
􏰟w⊤u1 􏰟
and observe that they should all converge to zero as w approaches the eigenvector u1 and becomes orthogonal
to other eigenvectors.
(a) Show that Ek(w(T)) = |λk/λ1|T ·Ek(w(0)), i.e. the convergence of the algorithm is exponential with the number of time steps T . (Hint: to show this, it is useful to rewrite the scatter matrix in terms of eigenvalues and eigenvectors, i.e. S = 􏰂di=1 uiu⊤i λi.)
Exercise 5: Programming (30 P)
Download the programming files on ISIS and follow the instructions.
􏰟􏰟 w ⊤ u k 􏰟􏰟
Ek(w) = 􏰟 􏰟 with k = 2,…,d,