CS计算机代考程序代写 chain The Austalian National University Semester 2, 2021

The Austalian National University Semester 2, 2021
School of Computing Tutorial 5

COMP3670/6670: Introduction to Machine Learning

These exercises will concentrate on vector calculus, and how to compute derivatives of functions that
live in higher dimensions.

Preliminaries

The formal definition of the derivative of a function f : R→ R is given by

df

dx
:= lim

h→0

f(x + h)− f(h)
h

Let f : Rn → R be a function of a vector x. The derivative of f(x) with respect to x is defined as

∇xf = gradf =
df

dx
:=
[
∂f(x)
∂x1

∂f(x)
∂x2

. . .
∂f(x)
∂xn

]
∈ (Rn → R)1×n

Note that
df
dx

is a row vector, where each element is a function of the form Rn → R. We write
∇xf ∈ (Rn → R)1×n. Some authors write ∇xf ∈ R1×n as an abuse of notation for the sake of brevity,
and ease of matching dimensions. Keep in mind that each element of the row vector isn’t a real number,
but itself a function.

Let g : R→ Rn be a function of a scalar t. The derivative of g(t) with respect to t is defined as

dg

dt
:=




dg1(t)
dt

dg2(t)
dt

dgn(t)
dt


 ∈ (R→ R)n×1

Note that
dg
dt

is a column vector, where each element is itself a function of the form R→ R. As before,
we notate this using an abuse of notation as

dg
dt
∈ Rn×1,

The reason why the derivatives are defined this way, is so that the dimensions match when we define
the chain rule.

Given f : Rn → R and g : R→ Rn, we can define two new functions

h : R→ R, h(t) = f(g(t))

k : Rn → Rn k(x) = g(f(x))

and we can define their derivatives as

dh

dt
=

df

dg

dg

dt
=
[
∂f(g)
∂g1

. . .
∂f(g)
∂gn

]
∂g1
∂t

∂gn
∂t


 = n∑

i=1

∂f(g)

∂gi

∂gi
∂t

and

dk

dx
=

dg

df

df

dx
=




∂g1
∂f

∂gn
∂f


[∂f(x)∂x1 . . . ∂f(x)∂xn ] =




∂g1
∂f

∂f(x)
∂x1

. . .
∂g1
∂f

∂f(x)
∂xn


. . .


∂gn
∂f

∂f(x)
∂x1

. . .
∂gn
∂f

∂f(x)
∂xn


 = A

where Aij =
∂gi
∂f

∂f(x)
∂xj

.

1

(Here, the term
∂f(g)
∂g1

means to substitute each output component of g into the inputs for f , and take

the partial derivative with respect to the gi, the i
th component of g.)

For a vector valued function f : Rn → Rm, we define the matrix of all first order derivatives as the
Jacobian, which is given by

J = ∇xf =
df(x)

dx
=
[
∂f(x)
∂x1

. . .
∂f(x)
∂xn

]
=




∂f1(x)
∂x1

. . .
∂f1(x)
∂xn


. . .


∂fm(x)
∂x1

. . .
∂fm(x)
∂xn


 , Jij = ∂fi(x)∂xj

.

You may also need the definition of matrix multiplication.

If A ∈ Rn×m and B ∈ Rm×p, the product C = AB is a matrix in Rn×p satisfying

Cij =

m∑
k=1

AikBkj

If A ∈ Rn×m, b ∈ Rm×1 and c ∈ Rn×1 then the matrix vector products Ab and cTA satisfy the
properties

(Ab)k =
m∑
j=1

Akjbj

and

(cTA)k =
n∑

i=1

Aikci

For x ∈ Rn, the Euclidean norm || · ||2 is given by

‖x‖2 :=

xTx

2

For all problems below, state the dimension of the answer where appropriate.

Question 1 Formal definition of derivative

Compute the derivative of f : R→ R, f(x) = x2 from the formal limit definition of the derivative.

Question 2 Vector Derivative of Scalar Function

Given f : R2 → R, f(x) = 2x1x2 + x1 + 3×2 + 5, compute dfdx .

Question 3 Scalar Derivative of Vector Function

Given g(t) : R→ R2,g(t) =
[
t2

et

]
compute

dg
dt

.

Question 4 Derivative of the L2 Norm

Let x ∈ Rn, and define k : Rn → R, k(x) = ‖x‖22 := x
Tx. Compute dk

dx
.

Question 5 Chain Rule, Scalar Derivative

Let h : R→ R, h(t) = f(g(t)), where f and g are defined in Question 2 and Question 3 respectively.

1. Compute dh
dt

by using the chain rule.

2. Compute dh
dt

by evaluating f(g(t)) first, and then differentiating the entire expression by t.
Compare your answer to the above and check that they match.

Question 6 Chain Rule, Vector Derivative

Let k : Rn → Rn,k(x) = g(f(x)), where f and g are defined in Question 2 and Question 3 respectively.

1. Compute dk
dx

using the chain rule.

2. Compute dk
dx

directly by using the Jacobian to differentiate g(f(x)). Check your answer matches
the above using chain rule.

Question 7 More Derivatives

1. Let f : Rn → R, f(x) = (xTx + 1)2.
Compute d

dx
f(x) using the chain rule. (You can use the previous questions to help you.)

2. Directly compute d
dx
f(x) by expanding out (xTx+1)2 first. Your result should match the above.

Question 8 Derivative of a Matrix-Vector product

Let A ∈ Rm×n and x ∈ Rn×1. Show that d
dx

(Ax) = A.

Question 9 Linear Regression

Let Φ ∈ Rn×m,w ∈ Rn×1, t ∈ Rm×1.
Let f : Rn → R, f(w) = 1

2

∥∥((wTΦ)T − t)∥∥2
2

1. Verify that f is well defined (the dimensions of all the components match up).

2. Compute d
dw

f(w).

Question 10 Matrix Gradient

Given X ∈ Rn×m and some vectors a ∈ R?×?,b ∈ R?×?.

1. What are the dimensions of a and b such that aTXb is well defined?1 What is the dimension of
the result?

2. Compute the matrix gradient d
dX

aTXb.

1Note that if X is square, symmetric and positive definite, then defining 〈a,b〉 := aTXb gives an inner product.

3