CSC338 Numerical Methods
Lecture 11
March 25, 2020
About these slides
These slides are meant to be presentation aid, not a source of information.
Please use these slides in conjunction with the notes and the slides that comes with the textbook.
Unconstrainted Optimization
Findalocalminimumoff :IR→IR Approach: Golden Section Search
If f is unimodal on [a, b] then we can iteratively shrink the interval in which the minima x⋆ lies in
Unconstrainted Optimization
Findalocalminimumoff :IR→IR Approach: Newton’s Method
Approximate f (x ) using a quadratic function, and find a critical point of the approximation.
Today
Find a local minimum of f : IRn → IR We’ll talk about:
Newton’s Method
Gradient Descent
Reading Contour Plots
Newton’s Method for f : IRn → IR
When f : IR → IR, we have the Taylor Series Expansion:
The result extends to f : IRn → IR
Newton’s Method Idea
In each iteration, we have an estimate xk of a minimum of f . So we approximate f(x) with
Newton’s Method Update Rule
One major disadvantage of Newton’s Method is that computing the Hessian Hf (x) is very expensive! (Recall Hf (x) ∈ IRn×n)
Newton’s Method Example
We wish to find a local minimum of f(x)=x14+x12x2+x12+2×2+x2,startingwithx0 = 1 .
First, compute ∇xf (x) and Hf (x)
1
Newton’s Method Example
We wish to find a local minimum of f(x)=x14+x12x2+x12+2×2+x2,startingwithx0 = 1 .
First, compute ∇xf (x) and Hf (x)
4×3+2xx +2x ∇f(x)=1 12 1
x x 12 + 4 x 2 + 1
12x+2x+2 2x Hf(x)= 1 2 1
2×1 4
1
Newton’s Method Example
1
Plug in x0 = 1 . What are the values of ∇xf (x0) and Hf (x0)?
4×3+2xx+2x ∇f(x)=1 12 1=
x x 12 + 4 x 2 + 1
12x+2x+22x Hf(x)=12 1=
2×1 4
Newton’s Method Example
Plug in x0 =
1
1 . What are the values of ∇xf (x0) and Hf (x0)?
8 ∇xf(x0)= 6
16 2 Hf(x0)= 2 4
Newton’s Method Example
We need s0 so that Hf (x0)s0 = −∇xf (x0). Solve for s0 in:
Use Gauss Elimination!
16 2 8 2 4s0=−6
Newton’s Method Example
We need s0 so that Hf (x0)s0 = −∇xf (x0). Solve for s0 in:
Use Gauss Elimination! We get
16 2 8 2 4s0=−6
−2 s=5
0 −4 5
Newton’s Method Update
How do we compute x1 given 1 −2
x=s=5? 0 10 −4
5
Function 3D Plot
Contour Plot
How to read contour plots
Steepest Descent
Steepest Descent / Gradient Descent
Key idea:
The gradient of a differentiable function points uphill
The negative gradient of a differentiable function points
downhill
Example: f : IR → IR Take f (x) = x2
Example: f : IR2 → IR
Take f (x) = x14 + x12x2 + x12 + 2×2 + x2, At:
0
x= 0,∇xf(x)=
1
x= 1,∇xf(x)=
Contour Plot
Note: The gradient is always perpendicular to the contour!
Why does this work?
Intuition, a function f : IRn → IR locally looks like a plane. In other words, locally we can approximate f using
It turns out that −∇f (x) is, locally, the direction of the steepest descent.
Steepest descent
Algorithm to find a minima of f : IRn → IR locally Start from an initial guess x0 and update:
Steepest descent pros & cons
Steepest descent example
Show slide 29 and 30
Steepest Descent vs Newton’s Method
Steepest Descent: Newton:
Quasi-Newton Methods
Steepest Descent: xk+1 = xk − αk∇f (xk)
Newton: xk+1 = xk − Hf (xk)−1∇f (xk) Quasi-Newton:
Where Bk is an approximation of the Hessian matrix.
Homework Grade Prediction Revisited
Recall the problem of predicting a student’s hw3 grade given their hw1 and hw2 grades.
a(1) a(1) b(1) 121
a(2) a(2)
1 2 A=. . . .
(73) (73) a1 a2
x
Problem: Find x = 1 to minimize ||Ax − b||2
x2
We can treat this as a non-linear optimization problem!
b(2) 1
b=. .
(73) b1
Grade Prediction as Non-Linear Optimization
Define
f (x) = ||Ax − b||2
= (Ax − b)T (Ax − b) =
Computing Gradient
Now, given that
Let’s compute ∇xf (x):
73
f (x) = (a(j)x1 + a(j)x2 − b(j))2 12
j=1
Gradient Descent
0 0.5 Startwithsomex0,e.g.x0= 0 orx0= 0.5.(Why?)
Then take gradient descent steps: xk+1 =xk−
until xk+1 is sufficiently close to xk, or until f(xk+1) is sufficiently close to f (xk )
Why Gradient Descent?
Instead of computing ∇xf (x) exactly, we can estimate the gradient using a small subset of our data (subset of 73 students)
Gradient descent works for more complicated functions, like neural networks!