程序代写代做 algorithm CSC338 Numerical Methods

CSC338 Numerical Methods
Lecture 11
March 25, 2020

About these slides
These slides are meant to be presentation aid, not a source of information.
Please use these slides in conjunction with the notes and the slides that comes with the textbook.

Unconstrainted Optimization
Findalocalminimumoff :IR→IR Approach: Golden Section Search
If f is unimodal on [a, b] then we can iteratively shrink the interval in which the minima x⋆ lies in

Unconstrainted Optimization
Findalocalminimumoff :IR→IR Approach: Newton’s Method
Approximate f (x ) using a quadratic function, and find a critical point of the approximation.

Today
Find a local minimum of f : IRn → IR We’ll talk about:
􏰌 Newton’s Method
􏰌 Gradient Descent
􏰌 Reading Contour Plots

Newton’s Method for f : IRn → IR
When f : IR → IR, we have the Taylor Series Expansion:
The result extends to f : IRn → IR

Newton’s Method Idea
In each iteration, we have an estimate xk of a minimum of f . So we approximate f(x) with

Newton’s Method Update Rule
One major disadvantage of Newton’s Method is that computing the Hessian Hf (x) is very expensive! (Recall Hf (x) ∈ IRn×n)

Newton’s Method Example
We wish to find a local minimum of f(x)=x14+x12x2+x12+2×2+x2,startingwithx0 = 1 .
First, compute ∇xf (x) and Hf (x)
􏰍1􏰎

Newton’s Method Example
We wish to find a local minimum of f(x)=x14+x12x2+x12+2×2+x2,startingwithx0 = 1 .
First, compute ∇xf (x) and Hf (x)
􏰍4×3+2xx +2x􏰎 ∇f(x)=1 12 1
x x 12 + 4 x 2 + 1
􏰍12x+2x+2 2x􏰎 Hf(x)= 1 2 1
2×1 4
􏰍1􏰎

Newton’s Method Example
􏰍1􏰎
Plug in x0 = 1 . What are the values of ∇xf (x0) and Hf (x0)?
􏰍4×3+2xx+2x􏰎 􏰍􏰎 ∇f(x)=1 12 1=
x x 12 + 4 x 2 + 1
􏰍12x+2x+22x􏰎 􏰍 􏰎 Hf(x)=12 1=
2×1 4

Newton’s Method Example
Plug in x0 =
􏰍1􏰎
1 . What are the values of ∇xf (x0) and Hf (x0)?
􏰍8􏰎 ∇xf(x0)= 6
􏰍16 2􏰎 Hf(x0)= 2 4

Newton’s Method Example
We need s0 so that Hf (x0)s0 = −∇xf (x0). Solve for s0 in:
Use Gauss Elimination!
􏰍16 2􏰎 􏰍8􏰎 2 4s0=−6

Newton’s Method Example
We need s0 so that Hf (x0)s0 = −∇xf (x0). Solve for s0 in:
Use Gauss Elimination! We get
􏰍16 2􏰎 􏰍8􏰎 2 4s0=−6
􏰍−2􏰎 s=5
0 −4 5

Newton’s Method Update
How do we compute x1 given 􏰍1􏰎 􏰍−2􏰎
x=s=5? 0 10 −4
5

Function 3D Plot

Contour Plot

How to read contour plots

Steepest Descent

Steepest Descent / Gradient Descent
Key idea:
􏰌 The gradient of a differentiable function points uphill
􏰌 The negative gradient of a differentiable function points
downhill

Example: f : IR → IR Take f (x) = x2

Example: f : IR2 → IR
Take f (x) = x14 + x12x2 + x12 + 2×2 + x2, At:
􏰍0􏰎
x= 0,∇xf(x)=
􏰍1􏰎
x= 1,∇xf(x)=

Contour Plot
Note: The gradient is always perpendicular to the contour!

Why does this work?
Intuition, a function f : IRn → IR locally looks like a plane. In other words, locally we can approximate f using
It turns out that −∇f (x) is, locally, the direction of the steepest descent.

Steepest descent
Algorithm to find a minima of f : IRn → IR locally Start from an initial guess x0 and update:

Steepest descent pros & cons

Steepest descent example
Show slide 29 and 30

Steepest Descent vs Newton’s Method
Steepest Descent: Newton:

Quasi-Newton Methods
Steepest Descent: xk+1 = xk − αk∇f (xk)
Newton: xk+1 = xk − Hf (xk)−1∇f (xk) Quasi-Newton:
Where Bk is an approximation of the Hessian matrix.

Homework Grade Prediction Revisited
Recall the problem of predicting a student’s hw3 grade given their hw1 and hw2 grades.
a(1) a(1)  b(1)  121
 a(2) a(2) 
1 2 A=. .  . . 
 (73) (73) a1 a2
􏰍x 􏰎
Problem: Find x = 1 to minimize ||Ax − b||2
x2
We can treat this as a non-linear optimization problem!
 b(2)  1
b=.  . 
 (73) b1

Grade Prediction as Non-Linear Optimization
Define
f (x) = ||Ax − b||2
= (Ax − b)T (Ax − b) =

Computing Gradient
Now, given that
Let’s compute ∇xf (x):
73
f (x) = 􏰇(a(j)x1 + a(j)x2 − b(j))2 12
j=1

Gradient Descent
􏰍0􏰎 􏰍0.5􏰎 Startwithsomex0,e.g.x0= 0 orx0= 0.5.(Why?)
Then take gradient descent steps: xk+1 =xk−
until xk+1 is sufficiently close to xk, or until f(xk+1) is sufficiently close to f (xk )

Why Gradient Descent?
Instead of computing ∇xf (x) exactly, we can estimate the gradient using a small subset of our data (subset of 73 students)
Gradient descent works for more complicated functions, like neural networks!