Review of Linear Algebra, Matrix Computations, Derivatives and Convexity
MIE 1624H February 1, 2022
Review of derivatives, gradients and Hessians:
Given a function f of n variables x1 , x2 , . . . , xn , we use the following notations to represent thevectorofvariablesandthefunction: x=(x1,x2,…,xn)T,f(x)=f(x1,x2,…,xn)
Copyright By PowCoder代写 加微信 powcoder
The gradient extends the notion of derivative, the Hessian matrix – that of second derivative.
We define the partial derivative relative to variable xi, written as ∂f , to be the derivative of
∂xi f with respect to xi treating all variables except xi as constant.
The gradient of f at x, written as ∇f(x), is
∇f(x)= ∂x2 . .
The gradient of f is a multivariate function of x, f(x) ∈ R,∇f(x) ∈ Rn.
The gradient vector ∇f(x) gives the direction of steepest ascent of the function f at point x. The gradient acts like the derivative in that small changes around a given point x∗ can be estimated using the gradient (see first-order Taylor series expansion and finite difference method).
Second partial derivatives ∂2f are obtained from f(x) by taking the derivative relative to ∂xi∂xj
xi (this yields the first partial derivative ∂f ) and then by taking the derivative of ∂f relative ∂xi ∂xi
∂f ∂x1
to x . So, we can compute j
Hessian matrix:
∂ x 1 ∂ x 1
= ∂2f , ∂2f and so on. This values are arranged into the ∂ x 21 ∂ x 1 ∂ x 2
∂2f ∂2f … ∂2f ∂ x 21 ∂ x 1 ∂ x 2 ∂ x 1 ∂ x n
∂2f ∂2f … ∂2f 2 ∂x2∂x1 ∂x2 ∂x2∂xn
∇f(x)= . .2 . .
. . .. .
∂2f ∂2f … ∂2f
The Hessian matrix is a symmetric matrix, that is ∂2f ∂xi∂xj
∂ x n ∂ x 1 ∂ x 2 ∂ x n
Computing gradients and Hessians:
Compute the gradient and the Hessian of the function f(x1, x2) = x21 − 3x1x2 + x2 at the point x = (x1,x2)T = (1,1)T.
∂f 2×1−3×2 −1
−1 ∂2f ∂2f
Taylor series expansion:
∂x2 ∂x1∂x2 21 2
∇f(x)=∂x1 = = ∂f −3×1 + 2×2
∂f ∂f ∂x2∂x1 ∂x2
Second-order Taylor series expansion:
f(x) = f(x0) + ∇f(x0)T (x − x0) + 12(x − x0)T ∇2f(x0)(x − x0)
First-order Taylor series expansion:
f(x) = f(x0) + ∇f(x0)T (x − x0)
f(x1,x2) = x21 − 3x1x2 + x2, compute f(1.01,1.01) using first- and second-order Taylor series expansion at the point x0 = (1, 1)T .
First-order Taylor series expansion:
T 1.01 − 1
f (1.01, 1.01) = f (1, 1) + ∇f (1, 1) 1.01 − 1 = −1 + (−1, −1)
Second-order Taylor series expansion: f(1.01,1.01)=f(1,1)+∇f(1,1) 0.01
0.01 = −1.02
= −1 + (−1, −1) 0.01 + 2 (0.01, 0.01)
2 −30.01 −3 2 0.01
0.01 = −1.0201
T0.01 1 2
+2(0.01,0.01)∇ f(1,1) 0.01 =
Convex functions:
Definition A function f is convex if for any x1, x2 ∈ C and 0 ≤ λ ≤ 1 f(λx1 + (1 − λ)x2) ≤ λf(x1) + (1 − λ)f(x2).
A square matrix A said to be positive definite (PD) if xT Ax > 0 for all x ̸= 0.
A square matrix A said to be positive semidefinite (PSD) if xT Ax ≥ 0 for all x.
Hessian ∇f2(x) is PD =⇒ strictly convex function.
Hessian ∇f2(x) is PSD =⇒ convex function.
Gradient ∇f(x ̄) = 0 and Hessian ∇f2(x ̄) is PSD =⇒ x ̄ is a minimum of the function f. Gradient ∇f(x ̄) = 0 and Hessian ∇f2(x ̄) is PD =⇒ x ̄ is a strict minimum of the function f.
Checking a matrix for PD and PSD by computing principal minors:
Leading principal minors Dk, k = 1,2,…,n of a matrix A = (aij)[n×n] are defined as a11 …a1k
. . Dk=det . .
ak1 … akk AsquarematrixAisPD⇔Dk >0forallk=1,2,…,n.
Consider the function f(x) = 3×21 + 3×2 + 5×23 − 2x1x2. The corresponding Hessian matrix is
3 −1 0 ∇2f(x)=2·−1 3 0
Leading principal minors of ∇2f(x) are
D1 =2·3=6>0, D2 =2 ·det −1 3 =2 ·[3·3−(−1)(−1)]=4·8=32>0,
3 −1 0 D3 = 23·det−1 3 0
= 23 ·([3·3·5+0·0·(−1)+0·0·(−1)]−[0·0·3+0·0·3+(−1)·(−1)·5]) = 8·40=320>0
So, the Hessian is positive definite (PD) and the function is strictly convex. 3
A square matrix A is PSD ⇔ all the principal minors of A are ≥ 0. The principal minor is
a…a i1 i1 i1 ip
A(i1 i2…ip)=det . . ,where1≤i1
How to calculate eigenvalues: Ax − λx = 0 ⇒ (A − λI)x = 0. Since x is non-zero, the determinant of (A − λI) should vanish. Therefore all eigenvalues can be calculated as roots of the equation (which is often called the characteristic equation of A):
Consider the Hessian matrix
det(A − λI) = 0. Example
3 −1 0 ∇2f(x)= −1 3 0
Computing eigenvalues
det(∇2 f (x) − λI ) = det −1 3 − λ 0 = (5 − λ)(λ2 − 6λ + 8) = (5 − λ)(λ − 2)(λ − 4) = 0.
Therefore, the eigenvalues are λ = 2, λ = 4 and λ = 5. As all of them are strictly positive, the
Hessian is positive definite (PD).
Try computing eigenvalues and determinants in Python:
import numpy as np
H = np.matrix( ((3,-1,0), (-1,3,0), (0,0,5)) )
eigenvalues, eigenvectors = np.linalg.eig(H)
determinant = np.linalg.det(H)
Properties of convex functions:
if f is convex function, its sublevel set f(x) ≤ α is convex;
positive multiple of convex function is convex:
f convex, α ≥ 0 =⇒ αf convex
sum of convex functions is convex:
f1, f2 convex =⇒ f1 + f2 convex
pointwise maximum of convex functions is convex: f1, f2 convex =⇒ max{f1(x), f2(x)} convex (corresponds to intersections of epigraphs)
affine transformation of domain: fconvex =⇒ f(Ax+b)convex
Composition rules:
Composite function
is convex if:
g convex; h convex nondecreasing g concave; h convex nonincreasing
Proof (differentiable functions, x ∈ R):
f′′ =h′′(g′)2 +g′′h′
f(x) = eg(x) is convex if g is convex
f(x) = 1/g(x) is convex if g is concave, positive
f(x) = g(x)p, p ≥ 1 is convex if g(x) is convex, positive
f(x) = h(g(x))
Show that the function ex + 12×2 is convex and solve min ex + 12×2.
First derivative: A function is increasing if f′ > 0, decreasing if f′ < 0 and neither if f′ = 0. Second derivative: A function is convex if f′′ > 0 and concave if f′′ < 0. Answer:f′(x)=ex+xandf′′(x)=ex+1 > 0.So,fisconvex.
Thus, we can find a solution to an optimization problem by solving f′(x) = 0, given f is convex.
Find the local/global minimum of the functions if exists:
f′(x) = −1/x, f′′(x) = 1/x2 > 0 – strictly convex function. f′(x) = −1/x = 0 =⇒ x→∞
0 1 2 3 4 5 6 7 8 9 10
f′(x) = 1 + ln x, f′′(x) = 1/x > 0 on the domain of ln x ⇒ strictly convex function. f′(x)=1+lnx=0 =⇒x=0.37(globalminimum).
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
−√xwhenx≥0
f′(x) = −0.5x−1/2, f′′(x) = 0.25x−3/2 ≥ 0 when x ≥ 0 ⇒ convex function. f′(x) = −0.5x−1/2 = 0 =⇒ x → ∞.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
(x1 −2)2 +(x2 +1)2 −2
2(x1 − 2) ∇f(x)= 2(x2 +1)
2 20 ∇f(x)= 02 ≻0
As ∇2f(x) is PD, f(x) is strictly convex function. 2
∇f(x) = 0 =⇒ x = −1 (global minimum).
(x−2)4 −10(x−2)2
f′(x) = 4(x − 2)3 − 20(x − 2), f′′(x) = 12(x − 2)2 − 20 – non-convex, non-concave function.
−3 −2 −1 0 1 2 3 4 5 6
Example of Newton Method
Consider minimizing the function f(x1, x2) = ex1+x2−2 + (x1 − x2)2. Given x0 = (1, 1)T , apply a full Newton step and compute x1.
ex1+x2−2 + 2(x1 − x2) ex1+x2−2 − 2(x1 − x2)
ex1+x2−2 +2 ex1+x2−2 −2 ex1+x2−2 − 2 ex1+x2−2 + 2
x0 − ∇2f(x0)−1 ∇f(x0) 0 1
∇ f(x) = x1 =
∇f(x)= −13
Instead of inverting matrix ∇2f(x0), which is costly, we can solve the system of equations. Please note that if we want to compute y = A−1b, we can solve the system of equations Ay = b to find y. So we can solve ∇2f(x0) · y = ∇f(x0) to get y = (∇2f(x0))−1 ∇f(x0).
3 −1y1 1 y1=0.5
f(x1) = 0.3679 <
−1 3 y = 1 ⇒ y=0.5 22
x0 − ∇2f(x0)−1 ∇f(x0) 1 0.5 0.5
1 − 0.5 = 0.5 f(x0) = 1
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com