1
COMP3223: Solutions to Calculus Exercises
October 23, 2020
Partial derivatives and matrix calculus
1. Using the symbol δab, the Kroenecker delta
{
δab= 1,a=b, 0, a ̸= b
show the following:
(a) for vector v with components vi, ∑ viδij = vj;
i∑
(b) for matrix A with elements (A)ij = aij, j aijδjk = aik;
(c) for matrices A, B, the element of the ith diagonal of C = AB is ex- pressed as ∑ aijbjkδki;
jk ∑
(d) the trace of a matrix is tr(A) = ij aijδij
(e) ∂wa =δab. ∂wb
2. For p×p matrix A with matrix elements (A)ij = aij, 1 ⩽ i, j ⩽ p and vector x = (x1,…,xp)T show that:
(a) the i-th element of vector (Ax) is (Ax)i = ∑ aijxj; j
(b) ∇x(Ax) := ∂ (Ax) = AT . Write out the indices explicitly: ∂x
(∂)∂∂∑ ∂x(Ax) = ∂x (Ax)j = ∂x ajkxk;
iji ik
(c) the gradient of the scalar quadratic form xT Ax is
∇ (xTAx) = (A+AT)x; x
hint: the i-th matrix element of the gradient is
∂∑
∂x xpapqxq;
i pq
1
(d) the partial derivative of the quadratic form xT Ax with respect to A can be evaluated for each matrix element aij, 1 ⩽ i, j ⩽ p:
and the Kronecker delta is
T herefore,
∂ ∂aij
(∑ ) xr ars xs
rs
with the result
withxxT ap×pmatrix.
(xTAx) = xxT; 1. (a) The column vector v has components vi
v= . , (1) v
2 Solutions
∇
A
v1 v2 v3
i
. vn
{
δij= 1, i=j, (2) 0, i̸=j.
∑n
viδij = v1δ1j +v2δ2j +v3δ3j +···+viδij +···+vnδnj =
i=1
=(hasonlynon-zerotermwithδij =1wheni=j)=vj (3)
(b) For m × n matrix A with components aij,
a a ···a ···a
11 12 1j a21 a22 ··· a2j ···
1n a2n
.
, (4) . . .
. . … . … A = a a · · · a · · ·
a am1 ··· ··· amj ··· amn
i1 i2 ij
in
. . . . . . . . . . . . . . . 2
∑n
aijδjk =ai1δ1k +ai2δ2k +ai3δ3k +···+ainδnk =
j=1
=(only non-zero term δjk = 1 occurs when j = k) = aik (5) This can also be seen by viewing the Kronecker delta as the matrix ele-
ments of the identity matrix (I)jk = δjk :
aik = (A)ik = (AI)ik = (c) Using Equation (5) we obtain
∑n ∑n (A)ij(I)jk =
aijδjk
∑∑∑∑
aijbjkδik = aij bjkδik =
jk j k j
aijbji =ai1b1i+ai2bi2+···+ainbni,
is
b32 b33 a11b12 + a12b22 + a13b32
a21b12 + a22b22 + a23b32 a31b12 + a32b22 + a33b32
a11b11 + a12b21 + a13b31 a21b11 + a22b21 + a23b31 a31b11 + a32b21 + a33b31
a11b13 + a12b23 + a13b33 a21b13 + a22b23 + a23b33 a31b13 + a32b23 + a33b33
,
and you should notice that the (i, j)-th element of C is Cij = ai1b1j + ai2b2j + ai3b3j.
j=1
j=1
(6) which is the ith diagonal (C)ii = (C)ijδij of C = AB. For example
3 × 3 matrices
a11 a12 A= a21 a22
a13 b11 a23 ,B= b21
b12 b13
b22 b23 ,thematrixC
a31 a32 a33 b31
Thus, the ith diagonal element (for i = 1, 2, 3) of C can be obtained from Equation (5) by setting n = 3.
(d) Following Equation (5) for matrix in form (4)
∑∑
aijδij = aii = a11 +a22 +a33 +···+aii +···+ann = tr(A). ij i
(7) (e) For a function f(x1, x2, · · · , xn) with n independent variables xi i =
1,…,n, the partial derivative ∂ f(x1,··· ,xi,··· ,xn) is defined as ∂xi
lim f(x1,…,xi−1,xi + hi,xi+1,…,xn) − f(x1,…,xi−1,xi,xi+1,…,xn)
hi→0
h
(8)
3
making the derivatives of f(w1, . . . , wn) = wa with respect to wa and wb
lim (wa +ha)−wa =1, lim wa −wa =0 =⇒ ∂wa =δab ha→0 ha hb→0 hb ∂wb (9)
2. (a) For matrix A shown explicitly in Equation (4) in the case of p × p and
input vector x,
x1
x = (x1,…,xp)⊤ = . ,
···
xp
a x a x +···+a x
(10) ( 1 1 )
(12)
a
Ax=…. . . . . . . = . . . ,
11 1p 1 111 1pp
ap1 ··· app xp ap1x1 +···+appxp where the ith element of the output vector y = Ax is yi
∑p j=1
(y)i = yi = (Ax)i = ai1x1 + ··· + aipxp =
aijxj.
(b) The derivative of the output yi with respect to xj measures how rapidly the output varies when the input is changed. Convince yourself that the answer is a matrix and pay attention to its row (i in yi) and column (j in xj) indices. Using Equation (9)
(∂)∂ ∂∑ ∑∂xk∑ ∂x (Ax) = ∂x (Ax)i = ∂x aikxk = aik ∂x =
ijj jk kjk
and using (5) we get
∑
aikδkj = aji = (A)ij. k
For a 3 × 3 example,
aikδkj,
y1 (Ax)1 x1a11 + x2a12 + x3a13
y2 = (Ax)2 = x1a21 + x2a22 + x3a23 (13)
y3 (Ax)3 x1a31 + x2a32 + x3a33 and so, (keeping track of row and column indices),
∂y ∂y ∂y 111
∂x1
∂ (y)=∂y2 ∂x ∂x1 ∂y3
∂x1
∂x2 ∂x3
∂y2 ∂y2 (14)
∂x2 ∂x3 ∂y3 ∂y3
∂x2 ∂x3
4
and writing out the terms explicitly we get
∂ ∂x
∂ (Ax) ∂ (Ax) ∂ (Ax) ∂x1 1 ∂x2 1 ∂x3 1
(Ax) = ∂ (Ax)2 ∂x1
∂ (Ax)3 ∂x1
∂ (Ax)2 ∂x2
∂ (Ax)3 ∂x2
∂ (Ax)2. ∂x3
∂ (Ax)3 ∂x3
Using the explicit forms in eq. (13) you can verify that
∂ a11 ∂x (Ax) = a21
a31
a12 a13
a22 a23 = A (15)
a32 a33
(c) The expression Q ≜ ∑
respect to x means that there will be a term for each component of the vector x. The answer should be a vector as well. For a (3 × 3) case,
pq
xpapqxq is a number. Its derivative with
a11 Q = (x1 x2 x3) a21
a31
which is written out as
Q = a11x21+a22x2+a33x23+(a12+a21)x1x2+(a13+a31)x1x3+(a23+a32)x2x3 The gradient ∇xQ has components
∂Q
∂x
a12 a13 x1 a22 a23 x2 a32 a33 x3
1 (a11 + a11)x1 + (a12 + a21)x2 + (a13 + a31)x3
∂Q = (a12 + a21)x1 + (a22 + a22)x2 + (a23 + a32)x3 = (A+AT )x
∂x2 (a13 + a31)x1 + (a23 + a32)x2 + (a33 + a33)x3 ∂Q
∂x3
For the general case,
∂∑ ∑(∂x ∂x) xax=apx+xq
∂x ppqq pq∂xq p∂x ipq pq i i
= = = = =
5
apq (δpixq + xpδqi)
∑
pq
∑∑
aiqxq + qp
xpapi apixp
∑∑
aipxp +
(p ) (A+AT)xi.
pp
∑(⊤) (A)ip + (A )ip
xp
(d) The expression Q ≜ ∑
respect to A means that there will be a term for each component of the matrix A. The answer should be a matrix as well. For a (3 × 3) case, Q is (as before),
Q = a11x21+a22x2+a33x23+(a12+a21)x1x2+(a13+a31)x1x3+(a23+a32)x2x3 and the matrix of partial derivatives ∇AQ is
pq
xpapqxq is a number. Its derivative with
∂Q ∂a11
∂Q ∂Q ∂a21 ∂a31
∇AQ = ∂Q ∂a12
∂Q ∂Q ∂a22 ∂a32
∂Q ∂Q ∂Q
∂a33 (16) x1 x3
x2x3 = xxT .
x3 x1
(∑ ) ∑ ( )
∂ xax = xx∂ars ∂a r rs s r s ∂a
ij ∑ij rs rs
xrxsδriδsj, (row and column indices have to match)
For the general case,
∂a13 x1 x1 = x2x1
∂a23 x1 x2 x2 x2 x3 x2
x3 x3
=
= xixj=(xxT)ij.
rs
6