CS代考 Machine Learning I Exam from 24.09.2020

Machine Learning I Exam from 24.09.2020
This document is not official; it is not endorsed by the university or the lecturer.
120 minutes, no auxiliary tools allowed, 20 + 15 + 25 + 20 + 20 = 100 points. 1. Multiple choice (4 × 5 = 20 Points)
Answer the following multiple choice questions.

(a) The Bayes error is
􏰗 the lowest error of a linear classifier.
􏰗 the expected error of a random linear classifier. 􏰘 the error of any nonlinear classifier.
􏰗 the error of a naive Bayes classifier .
(b) The Fisher linear discriminant find the projection y = wTx of the data that maximises 􏰗 the margin between the two data generating distributions.
􏰗 the within-class variance divided by the between-class variance.
􏰗 the margin between the means of the data generating distributions.
􏰘 the between-class variance divided by the within-class variance. (c) A biased estimator is used to
􏰘 make the estimator less affected by the sampling of the data.
􏰗 make the estimation procedure more sensitive to the sample data. 􏰗 reduce the risk of underfitting the data.
􏰗 None of the above, an unbiased estimator is always better.
(d) Let x1, . . . , xN ∈ Rd be unlabelled observations. Consider a Gaussian Gram matrix K ∈ RN ×N . Which is always true?
􏰘 ∀u∈RN uKu≥0. 􏰗 ∀u∈RN uKu≤0.

2. Neural Networks (10 + 5 = 15 Points)
(a) Build a neural network that models the function f: R2 → {0,1}, x 􏰕→ 1min(x1,x2)≤−1(x) with at most three neurons of the form aj = step (􏰈i wij ai + bj ), where step(z) := 1{z≥0}(z). State weights and biases.
(b) State the number of neurons needed to build a neural network that models f : Rd → {0, 1}, x 􏰕→ 1∥x∥∞≤5(x) and describe the weights and bias of one such neurons.
3. Maximum likelihood and Bayes (5 × 5 = 25 Points)
People queue at the post office and their i.i.d processing times are D = (x1, x2, x3) = (1, 1, 2). The data generating distribution is P (xi = k) = (1 − θ)k−1θ, where k ∈ N ∪{∞} and θ ∈ [0, 1] is unknown.
(a) State likelihood function P(D|θ).
(b) Find the maximum likelihood parameter θˆ.
(c) Evaluate P (x4 > 1|θˆ).
Define a1 = step(−x1 − 1) and a2 = step(−x2−1)tocheckifx1 ≤−1and x1 x2 ≤ −1. If (at least) one of them gives
1, we want the output to be one and zero x2 else. Thus a3 = step(a1 + a2 − 1).
Weneed2d+1neurons. Wehavethat∥x∥∞ ≤5ifandonlyif−5≤xk ≤5for all k ∈ {1,…,d}. For k ∈ {1,…,d}, we thus take a2k−1 = step(5 − xk) (to check that xk ≤ 5) and a2k = step(xk + 5) (to check that xk ≥ −5). The output neuron is
a2d+1 = step(􏰈2d 1 ak − 1), as we only want to output 1 if all other aj give 1. k=1 2d
P(D|θ) = (1 − θ)1−1θ · (1 − θ)1−1θ · (1 − θ)2−1θ = θ3(1 − θ).
Wehaveθˆ=argmax P(D|θ).Wehave dθ3(1−θ)=3θ2−4θ3,soθ=0orθ=3. θdθ 4
We also have to check the boundary of the definition domain of P(D|θ): we have
P (D|0) = 0 = P (D|1) < P (D| 3 ) = 27 , so θˆ = 3 . 4 64 4 Since x4 can be every integer between 2 and ∞, we have P ( x 4 > 1 | θˆ ) = k=1
1 − 4 4 k=2
∞ ∞􏰑 􏰒k−1 􏰀􏰁 􏰂k−1 􏰀 3 3
1 − θˆ θˆ = 4 = 4 · 3 = 4.
P ( x i = k ) =
3 􏰀∞ 􏰑 1 􏰒 k 3 1 1

The sum is a geometric series so we can get the finite expression 13 for it. Simpler computation using the complement:
P (x4 > 1|θˆ) = 1 − P (x4 = 1|θˆ) = 1 − (1 − θˆ)1−1 θˆ = 1 − θˆ = 1 − 43 = 14 .
We now adopt a Bayesian view point on this problem, where we assume a prior distribution for the parameter θ to be defined as:
1, θ ∈ [0, 1],
(d) Show that the posterior distribution p(θ|D) is 20(1 − θ)θ3 for θ ∈ [0, 1] and zero elsewhere.
By the theorem of Bayes and the law of total probability we have p(D|θ)p(θ) p(D|θ)p(θ) θ3(1 − θ) · 1[0,1](θ)
p(θ|D) = p(D) = 􏰃 p(D|θ)p(θ)dθ = 􏰃1 θ3(1−θ)dθ R0
= θ3(1 − θ) · 1[0,1](θ) = 20(1 − θ)θ3 · 1[0,1](θ) 1
(e) Evaluate P (x4 > 1|D) = 􏰃 p(x|θ)p(θ|D) dθ.
P(x4 >1|D)=1−P(x4 =1|θˆ)=1−
20(1−θ)θ3 ·1[0,1](θ)·θ(1−θ)1−1dθ
= 1 − 20 θ4(1 − θ) dθ = 1 − 20
= 1 − 32 = 13
θ4 − θ5 dθ = 1 − 20 5 − 6
4. Lagrange multipliers (4 × 5 = 20 Points)
Let Σ ∈ Rd×d be a positive semidefinite matrix. Consider the constrained maximisation problem:
max ∥w∥2 subject to wTΣ−1w = 1 w∈Rd
(a) State the Lagrange function.
(b) Show that the problem is an eigenvalue problem of Σ.
L(w, λ) := ∥w∥2 + λ(1 − wTΣ−1w).

For w to be optimal, we need
∂L(w,λ) =2w−2λΣ−1w=! 0 ⇐⇒ w=λΣ−1w ⇐⇒ Σw=λw,
so w has to be a eigenvector of Σ with eigenvalue λ.
(c) Show that the solution is the eigenvector associated to the highest eigenvalue of Σ.
(d) Let w1, . . . , wT be a sequence of vectors where wt is obtained from wt−1 as the solution of the constraint problem
max zTwt−1 subject to zTΣ−1z = 1. z∈Rd
Find a closed form solution of wt as a function of wt−1.
From the constraint wTΣ−1w = 1 and w = λΣ−1w we get (as Σ is symmetric) ∥w∥2 = wTw = λwTΣ−1w = λ.
Thus the value of the eigenvalue coincides with the quantity we want to maximise.
The Lagrangian is
In order for z to be optimal, we require
L(z, λ) := zTwt−1 + λ(1 − zTΣ−1z).
∂ L ( z , λ ) = w t − 1 − 2 λ Σ − 1 z =! 0 ⇐ ⇒ w t − 1 = 2 λ Σ − 1 z ⇐ ⇒ z = 1 Σ w t − 1 .
∂z 2λ Plugging the second last equality into the constraint zTΣ−1z = 1, we get
zTwt−1 = 2λzTΣ−1z = 2λ and using the last equality we get
2λ=zTw =1wT Σw , t−1 2λ t−1 t−1
2λ=􏰐wT Σw , t−1 t−1
as Σ is positive semidefinite (so we don’t have to consider −√. . .). We thus get
Σwt−1 wt=z=􏰐 =
wT Σw t−1 t−1
Σwt−1 2 T −1
∥Σwt−1∥Σ−1
with∥x∥Σ−1 :=xΣ x.

5. Ridge regression (10 + 10 = 20 Points) Consider the problem
min ∥y − Xw∥2 subject to ∥w∥∞ ≤ C, w∈Rd
whereC>0isaconstant,y∈RN andX∈RN×d isthedatamatrix. (a) Show that the problem is equivalent to
minwTXTXw−2yTXw subjectto −C≤wi≤C∀i∈{1,…,d} w∈Rd
∥y−Xw∥2 =(y−Xw)T(y−Xw)=yTy−yTXw−(Xw)Ty+(Xw)TXw
= yTy − 2yTXw + wTXTXw.
Since yTy is independent of w, we can neglect it when minimising over w. We have
yTXw = (Xw)Ty, as it is a scalar and so it is equal to its transpose.
Furthermore, ∥w∥∞ = max{|w1|, . . . , |wd|}, so ∥w∥∞ ≤ C is equivalent to |wk| ≤ C for all k ∈ {1,…,d}, i.e. −C ≤ wk ≤ C for all k ∈ {1,…,d}.
(b) At our disposal we have a quadratic solver QP(Q, l, A, b), which solves the generic quadratic problem
minvTQv+lTv subjectto Av≤b. v
Write the numpy code constructing the arrays Q,l,A and b from X, y and C.
def Reg(X, y, C):
Q = X.T.dot(X)
l = – 2 * y.T.dot(X).T
d = Q.shape[0]
A = np.concatenate([np.identity(d), -1 * np.identity(d)], axis=0)
b = C * np.ones(2 * d)
t = QP(Q, l, A, b)
The grey code was given.
Thanks to everyone contributing to this account of the exam and its solutions 🙂

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts