CS计算机代考程序代写 COMP90051 StatML 2020S2 Q&A 02 – Solutions

COMP90051 StatML 2020S2 Q&A 02 – Solutions

August 6, 2021

Exercise 1: A dataset has two instances with one feature, X =

[
2
4

]
and the response is y =

[
1
2

]
.

We want to fit a linear regression model on this dataset. Please use the normal equation of linear
regression to find w and b.

This question is to showcase some basic operations on matrices and vectors such as matrix-
matrix, matrix-vector multiplications, and transposes. It also makes more concrete what one does
once they have the normal equations: those come from analytical solution of linear regression
training (formulated as either an MLE or decision-theoretically optimising the sum of squared
errors). To get some more intuition, you might like to plot the x’s and y’s here. You might expect
the answer as w = 0.5, b = 0 visually. Although in general even for 1D data, it isn’t possible to do
this visually.

First, we can add a dummy feature into X =

[
1 2
1 4

]
, to absorb the bias term in the weights

vector w =

[
b
w

]
. Then, we can directly find out the optimal value of ŵ by solving the normal

equation.

ŵ = (XtX)−1Xty

=

([
1 2
1 4

]t [
1 2
1 4

])−1 [
1 2
1 4

]t [
1
2

]

=

([
1 1
2 4

] [
1 2
1 4

])−1 [
1 1
2 4

] [
1
2

]
=

[
2 6
6 20

]−1 [
3
10

]
=

1

2× 20− 6× 6

[
20 −6
−6 2

] [
3
10

]
=

[
5 −3/2
−3/2 1/2

] [
3
10

]
=

[
0

1/2

]
where we have used a formula for inverting 2× 2 matrices which we don’t expect students to know
or remember.

1

Exercise 2: Show that Newton-Raphson for linear regression gives you the normal equations.

Linear regression training has us minimise the sum of squared errors. We can include a 0.5 factor
to make the derivative more convenient (cancelling the 2 that we’d otherwise obtain):

L)(w) =
1

2
‖Xw − y‖22 .

Newton-Raphson makes use of both the gradient and Hessian of this objective function, so let’s
start by calculating those. The first-order derivatives (gradient):

∇L(w) = XtXw −Xty

The second-order derivatives (Hessian):

∇2L(w) = XtX

To apply Newton-Raphson method, we need to set initial value to w0 then iteratively update it.
Let’s not worry for now what intial value we choose (but in practice it could be a random value).
The updated wt+1 is given by:

wt+1 = wt − (∇2L(w))−1∇L(w)

Plug in the expressions for ∇2L(w) and ∇L(w), and simplify the equation.

wt+1 = wt − (XtX)−1(XtXwt −Xty)
= wt − (XtX)−1XtXwt + (XtX)−1Xty
= wt − Iwt + (XtX)−1Xty
= wt −wt + (XtX)−1Xty
= (XtX)−1Xty

That is, we get that the first and all future iterates are given by the normal equation of linear
regression! Moreover, wt+1 does not depend on wt, and no matter what the initial value of w0 is,
the Newton-Raphson method converges in one step to the usual linear regression solution.

Hat tip to Cameron for also outlining this solution in Piazza too!

2