The Austalian National University Semester 2, 2021
School of Computing Tutorial 5
COMP3670/6670: Introduction to Machine Learning
These exercises will concentrate on vector calculus, and how to compute derivatives of functions that
live in higher dimensions.
Preliminaries
The formal definition of the derivative of a function f : R→ R is given by
df
dx
:= lim
h→0
f(x + h)− f(h)
h
Let f : Rn → R be a function of a vector x. The derivative of f(x) with respect to x is defined as
∇xf = gradf =
df
dx
:=
[
∂f(x)
∂x1
∂f(x)
∂x2
. . .
∂f(x)
∂xn
]
∈ (Rn → R)1×n
Note that
df
dx
is a row vector, where each element is a function of the form Rn → R. We write
∇xf ∈ (Rn → R)1×n. Some authors write ∇xf ∈ R1×n as an abuse of notation for the sake of brevity,
and ease of matching dimensions. Keep in mind that each element of the row vector isn’t a real number,
but itself a function.
Let g : R→ Rn be a function of a scalar t. The derivative of g(t) with respect to t is defined as
dg
dt
:=
dg1(t)
dt
dg2(t)
dt
…
dgn(t)
dt
∈ (R→ R)n×1
Note that
dg
dt
is a column vector, where each element is itself a function of the form R→ R. As before,
we notate this using an abuse of notation as
dg
dt
∈ Rn×1,
The reason why the derivatives are defined this way, is so that the dimensions match when we define
the chain rule.
Given f : Rn → R and g : R→ Rn, we can define two new functions
h : R→ R, h(t) = f(g(t))
k : Rn → Rn k(x) = g(f(x))
and we can define their derivatives as
dh
dt
=
df
dg
dg
dt
=
[
∂f(g)
∂g1
. . .
∂f(g)
∂gn
]
∂g1
∂t
…
∂gn
∂t
= n∑
i=1
∂f(g)
∂gi
∂gi
∂t
and
dk
dx
=
dg
df
df
dx
=
∂g1
∂f
…
∂gn
∂f
[∂f(x)∂x1 . . . ∂f(x)∂xn ] =
∂g1
∂f
∂f(x)
∂x1
. . .
∂g1
∂f
∂f(x)
∂xn
…
. . .
…
∂gn
∂f
∂f(x)
∂x1
. . .
∂gn
∂f
∂f(x)
∂xn
= A
where Aij =
∂gi
∂f
∂f(x)
∂xj
.
1
(Here, the term
∂f(g)
∂g1
means to substitute each output component of g into the inputs for f , and take
the partial derivative with respect to the gi, the i
th component of g.)
For a vector valued function f : Rn → Rm, we define the matrix of all first order derivatives as the
Jacobian, which is given by
J = ∇xf =
df(x)
dx
=
[
∂f(x)
∂x1
. . .
∂f(x)
∂xn
]
=
∂f1(x)
∂x1
. . .
∂f1(x)
∂xn
…
. . .
…
∂fm(x)
∂x1
. . .
∂fm(x)
∂xn
, Jij = ∂fi(x)∂xj
.
You may also need the definition of matrix multiplication.
If A ∈ Rn×m and B ∈ Rm×p, the product C = AB is a matrix in Rn×p satisfying
Cij =
m∑
k=1
AikBkj
If A ∈ Rn×m, b ∈ Rm×1 and c ∈ Rn×1 then the matrix vector products Ab and cTA satisfy the
properties
(Ab)k =
m∑
j=1
Akjbj
and
(cTA)k =
n∑
i=1
Aikci
For x ∈ Rn, the Euclidean norm || · ||2 is given by
‖x‖2 :=
√
xTx
2
For all problems below, state the dimension of the answer where appropriate.
Question 1 Formal definition of derivative
Compute the derivative of f : R→ R, f(x) = x2 from the formal limit definition of the derivative.
Question 2 Vector Derivative of Scalar Function
Given f : R2 → R, f(x) = 2x1x2 + x1 + 3×2 + 5, compute dfdx .
Question 3 Scalar Derivative of Vector Function
Given g(t) : R→ R2,g(t) =
[
t2
et
]
compute
dg
dt
.
Question 4 Derivative of the L2 Norm
Let x ∈ Rn, and define k : Rn → R, k(x) = ‖x‖22 := x
Tx. Compute dk
dx
.
Question 5 Chain Rule, Scalar Derivative
Let h : R→ R, h(t) = f(g(t)), where f and g are defined in Question 2 and Question 3 respectively.
1. Compute dh
dt
by using the chain rule.
2. Compute dh
dt
by evaluating f(g(t)) first, and then differentiating the entire expression by t.
Compare your answer to the above and check that they match.
Question 6 Chain Rule, Vector Derivative
Let k : Rn → Rn,k(x) = g(f(x)), where f and g are defined in Question 2 and Question 3 respectively.
1. Compute dk
dx
using the chain rule.
2. Compute dk
dx
directly by using the Jacobian to differentiate g(f(x)). Check your answer matches
the above using chain rule.
Question 7 More Derivatives
1. Let f : Rn → R, f(x) = (xTx + 1)2.
Compute d
dx
f(x) using the chain rule. (You can use the previous questions to help you.)
2. Directly compute d
dx
f(x) by expanding out (xTx+1)2 first. Your result should match the above.
Question 8 Derivative of a Matrix-Vector product
Let A ∈ Rm×n and x ∈ Rn×1. Show that d
dx
(Ax) = A.
Question 9 Linear Regression
Let Φ ∈ Rn×m,w ∈ Rn×1, t ∈ Rm×1.
Let f : Rn → R, f(w) = 1
2
∥∥((wTΦ)T − t)∥∥2
2
1. Verify that f is well defined (the dimensions of all the components match up).
2. Compute d
dw
f(w).
Question 10 Matrix Gradient
Given X ∈ Rn×m and some vectors a ∈ R?×?,b ∈ R?×?.
1. What are the dimensions of a and b such that aTXb is well defined?1 What is the dimension of
the result?
2. Compute the matrix gradient d
dX
aTXb.
1Note that if X is square, symmetric and positive definite, then defining 〈a,b〉 := aTXb gives an inner product.
3