Machine Learning 1 WS19/20 5 March 2020
Geda ̈chtnisprotokoll
First exam session, duration: 120 minutes
Exercise 1 – multiple choice (20 pts)
Copyright By PowCoder代写 加微信 powcoder
Only 1 answer is correct
1. Given two normal distributions p(x|w1) ∼ N (μ1, Σ1) and p(x|w2) ∼ N (μ2, Σ2) what is a
necessary and sufficient condition for the optimal decision boundary to be linear? (5pts)
(a) Σ1 = Σ2
(b) Σ1 = Σ2,P(w1) = P(w2)
(c) … (d) …
2. We have a classifier that decides the class argmaxwi fi(x) for the input x. What is a suitable discriminant functions fi? (5pts)
(a) p(x|wi)P(wi)
(b) log(p(x|wi)+P(wi))
(c) … (d) …
3. K-means is (5pts)
(a) a non-convex algorithm used to cluster data (b) a kernelized version of the means algorithm
(c) … (d) …
4. Error backpropagation gives (5pts)
(a) the gradient of the error function
(b) the optimal direction in parameter space
(c) … (d) …
Exercise 2 – Neural Networks (15pts)
1. Given x ∈ R2 implement the function 1{|x1|+|x2|≥2} using the following activation function: 1{aiwij+bj≥0}. Where 1{…} is the indicator function. Draw the NN and provide weights and biases. Use only 5 neurons (excluding the input neurons) (10pts)
2. State how many neurons are need to implement 1{|x1|+…|xd|≥d} for x ∈ Rd. Provide weights and bias for a neuron of your choice (5pts).
Exercise 3 – Lagrange (25pts)
Let A ∈ Rd×d, B ∈ Rh×h be two positive definite matrices maxwTAw+vTBv subjectto ∥w∥2+∥v∥2=1
1. Write the lagrangian (5pts)
2. Derive equations that lead to the solution (5pts)
3. Show that the problem is equivalent to an eigenvector problem of a matrix C ∈ R(d+h)×(d+h) (5pts)
4. Show that the solution is the eigenvector corresponding to the largest eigenvalue (5pts)
5. Show how the solution for C can be derived from two subproblems for A and B. Hint: the set of eigenvalues of a block diagonal matrix is the union of the eigenvalues of the matrices on the diagonal (5pts)
Exercise 4 – Kernels (20pts)
A positive definite kernel satisfies
nn cicjk(xi,xj) ≥ 0
i=1 j=1 for all x1,…,xn ∈ Rd and ci,…,cn ∈ R
1. Show that k(x, x′) = ⟨x, x′⟩ is a PD kernel (5pts)
2. Show that k(x, x′) = ⟨x, x′ + 2⟩ is not a PD kernel (add 2 to each component of x′)(5pts)
3. Show that g(x, x′) = k(ξ, x)k(x, x′)k(x′, ξ) is a PD kernel, for any ξ ∈ Rd and a PD kernel k with feature map φ : Rd → Rh, i.e., k(x, x′) = ⟨φ(x), φ(x′)⟩ (5pts)
4. Give a feature map ψ for g (5pts)
Exercise 5 – implementing RR (20pts)
You will be implementing ridge regression. Assume numpy and scipy are already imported. Fill in the gaps in the following code snippets. Your code must be efficient (e.g. no loops)
1. Implement a function that given a N × 2 matrix returns a N × 5 matrix after applying the feature map φ(x1, x2) = [1, x1, x2, x21, x2] (5pts)
1 2 3 4 5 6 7 8
2. Implement the training part of RR (λ = 0.1) (5pts), that is β = (φ(X)Tφ(X) + λI)−1φ(X)Ty
1 2 3 4 5 6 7 8 9
3. Implement the prediction part (5pts)
1 2 3 4 5 6 7 8 9
4. Compute the fraction of samples for which the prediction satisfies |y−f(x)| < 0.01 (5pts)
1 2 3 4 5 6 7 8 9
def Phi(X):
return . . .
def train(self , Xtrain , Ytrain):
self.beta = ...
def predict(self , Xtest):
return Ftest
def Accuracy(self , Xtest, Ytest):
return Acc
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com