程序代写代做代考 algorithm Design optimization algorithms and tools

Design optimization algorithms and tools

Intro to gradient-based
optimization

ME 564/SYS 564
Wed Sep 26, 2018
Steven Hoffenson

Goal of Week 5: To learn the optimality conditions for unconstrained problems,
be able to solve problems with them, and know two derivative-based algorithms

Recap: How to optimize

1. Formulate the problem
a) Define system boundaries

b) Develop analytical models

c) Explore/reduce the problem space

d) Formalize optimization problem

2. Solve the problem
a) Choose the right approach/algorithm

b) Solve (by hand, code, or software)

c) Interpret the results

d) Iterate if needed

𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱0

(Weeks 1-2, 4, 9-12)

(Weeks 3, 5-8, 12)
TODAY

Recap: Week 3

• Linear programs are special cases
• All functions monotonic

• Solutions must lie on boundary of design space

• Simplex algorithm is efficient

• Derivative-free algorithms for nonlinear problems
are straightforward and robust, but may take
longer and often converge on local optima
• Coordinate search

• Nelder-Meade

• Space-filling DIRECT

• Simulated Annealing

Recap: DOEs and surrogate modeling

• Surrogate modeling is fitting a mathematical
function to your data to speed up evaluations and
optimization

• This involves four general steps:
• Gather data (e.g., using a DOE)

• Choose a function structure (e.g., linear, polynomial,
kriging, ANN)

• Fit a function to the data

• Assess fitness

• Watch out for outliers, underfitting, and overfitting

Unconstrained gradient-
based methods

“Unconstrained” means we are talking about
problems that have interior optima, not optima that
lie on a constraint (like what monotonicity analysis

can help with)

Basic nonlinear problem

How do we solve this?

How do we know
it’s a minimizer?

1-variable optimality conditions

1. First-order necessary condition

If 𝑓(𝑥) is differentiable and
𝑥∗ is a local minimum, then
𝜕𝑓

𝜕𝑥
(𝑥∗) = 0.

2. Second-order sufficient condition

If
𝜕𝑓

𝜕𝑥
(𝑥∗) = 0 and

𝜕2𝑓

𝜕𝑥2
𝑥∗ > 0,

then 𝑥∗ is a local minimum.

Note: this is not sufficient – 𝑥∗ could be a local
maximum or saddle point (rather than a minimum)

zero slope

increasing slope

Example

min 𝑓 𝑥 = 𝑥 + 𝑒−𝑥

𝑑𝑓

𝑑𝑥
= 1 − 𝑒−𝑥

𝑑𝑓

𝑑𝑥
𝑥∗ = 1 − 𝑒−𝑥

∗
= 0

1 = 𝑒−𝑥
∗

ln(1) = −𝑥∗

𝑥∗ = 0

𝑑2𝑓

𝑑𝑥2
= 𝑒−𝑥

𝑑2𝑓

𝑑𝑥2
𝑥∗ = 𝑒−𝑥

∗

𝑑2𝑓

𝑑𝑥2
0 = 𝑒0 = 1

First-order necessary condition Second-order sufficient condition

This is a stationary point

Since this is >0,
it is a minimum

Confirm
with plot:

Global vs. local optima

Stationary points
𝜕𝑓

𝜕𝑥
(𝑥∗) = 0 can be minima, maxima, or

saddle points

global maximum
on interval [0,3.25]

global minimum
on interval [0,3.25]

local minimum

saddle point

Multiple variables

Gradient: 𝛻𝑓(𝐱) ≜
𝜕𝑓

𝜕𝑥1
,
𝜕𝑓

𝜕𝑥2
, … ,

𝜕𝑓

𝜕𝑥𝑛

Hessian: 𝐇(𝐱) ≜

𝜕2𝑓

𝜕𝑥1
2 ⋯

𝜕2𝑓

𝜕𝑥1𝑥𝑛

⋮ ⋮

𝜕2𝑓

𝜕𝑥𝑛𝑥1
⋯

𝜕2𝑓

𝜕𝑥𝑛
2

Multi-variable optimality conditions

1. First-order necessary condition

If 𝑓(𝐱) is differentiable and 𝐱∗ is
a local minimum, then 𝛁𝑓(𝐱∗) = 𝟎.

2. Second-order sufficient condition

If 𝛁𝑓(𝐱∗) = 𝟎 and 𝐇 𝐱∗ is positive-definite,
then 𝐱∗ is a local minimum

Recall: Points that satisfy the
necessary condition are called
“stationary points,” and they
are not all minima!

Hessian properties

A Hessian is positive-definite if 𝜕𝐱𝑇𝐇 𝐱∗ 𝜕𝐱 > 0 for all 𝜕𝐱 ≠ 𝟎.

A matrix is positive-definite if and only if any of these hold:

1. All of its eigenvalues are positive.

2. All determinants of its leading principal minors are positive.

3. All the pivots are positive when the matrix is reduced to
row-echelon form.

Replace positive-
definite with:

Replace > with: Nature of x*

negative-definite < local maximum positive-semidefinite ≥ probable valley negative-semidefinite ≤ probable ridge indefinite have both + & - saddle point Other matrix classification definitions Note: 𝜕𝐱 = 𝐱 − 𝐱∗ Note: The 3 “tricks” above only apply for positive-definite, and cannot all be used to prove negative- or semi-definiteness Sometimes we can solve problems using the optimality conditions 1. Apply First Order Necessary Condition (FONC) Find stationary points 𝐱∗ where 𝛻𝑓 𝐱∗ ≜ 𝜕𝑓 𝜕𝑥1 , 𝜕𝑓 𝜕𝑥2 , … , 𝜕𝑓 𝜕𝑥𝑛 = (0,0,… , 0) 2. Apply Second Order Sufficient Condition (SOSC) Test each stationary point 𝐱∗ for a positive-definite Hessian 𝜕𝐱𝑇𝐇 𝐱∗ 𝜕𝐱 > 0 for all 𝜕𝐱 ≠ 𝟎

If a point 𝐱∗ meets both conditions, it is a local minimum!

Example (4.6 in book)

Find the minimum of the following function:

𝑓 𝐱 = 2𝑥1 + 𝑥1
−2 + 2𝑥2 + 𝑥2

−2

𝛻𝑓 𝐱 = 2 − 2𝑥1
−3 2 − 2𝑥2

−3 𝐱∗ =
𝑥1
𝑥2

=
1
1

Stationary Point
FONC

𝐇 𝐱 =
6𝑥1
−4 0

0 6𝑥2
−4

𝜕𝐱 =
𝜕𝑥1
𝜕𝑥2

= 𝐱 − 𝐱∗ =
𝑥1 − 1
𝑥2 − 1

𝐇 𝐱∗ =
6 0
0 6

𝜕𝐱T𝐇 𝐱∗ 𝜕𝐱 = 6 𝜕𝑥1
2 + 6 𝜕𝑥2

2 > 0

SOSC

Positive Definite

Therefore, [1;1]
is a minimum!

Eigenvalues to test SOSC

Rather than checking the quadratic term
𝜕𝐱𝑇𝐇 𝐱∗ 𝜕𝐱 > 0, we can check eigenvalues of 𝐇 𝐱∗

Eigenvalues of H(x*) Hessian Matrix Nature of x*

All Positive ( >0 ) Positive definite Local min

All Negative ( <0 ) Negative definite Local max All Nonnegative ( ≥ 0 ) Positive semidefinite Probable valley All Nonpositive ( ≤ 0 ) Negative semidefinite Probable ridge Any sign Indefinite Saddle point 15 Example Recall our example 𝐇 𝐱 = 6𝑥1 −4 0 0 6𝑥2 −4 𝐇 𝐱∗ = 6 0 0 6 Eigenvalues at 𝐱∗: 𝜆1 = 6 and 𝜆2 = 6 Eigenvalues of the general Hessian: 𝜆1 = 6𝑥1 −4 and 𝜆2 = 6𝑥2 −4 Positive definite at 𝐱∗ Local Minimum Positive definite everywhere 16 Determinants to test SOSC If your matrix is 2x2, you can simply check the determinant of H(x*) Determinant of H(x*) Hessian Matrix Nature of x* Positive and h11 > 0 Positive definite Local min

Positive and h11 < 0 Negative definite Local max Zero and h11 > 0 Positive semidefinite Probable valley

Zero and h11 < 0 Negative semidefinite Probable ridge Negative Indefinite Saddle point Note: h11 is the first element (upper-left value) of H Another note: h11 is the first leading principal minor of H, and the full matrix is the second leading principal minor. This follows option 2 from Slide 12. 17 Example Recall our example 𝐇 𝐱 = 6𝑥1 −4 0 0 6𝑥2 −4 𝐇 𝐱∗ = 6 0 0 6 det 𝐇 (𝐱∗) = 36 > 0

ℎ11 = 6 > 0

Positive definite at 𝐱∗

Local Minimum

Interior vs. boundary optima

Interior optimum

Boundary optimum

Note that the necessary and sufficient conditions
usually do not apply to boundary optima

Local approximation
A review of Taylor series expansion for approximating

function behavior (in a local neighborhood)

Taylor Series Approximation

𝑓 𝑥 = 𝑓 𝑥0 +
𝑓′ 𝑥0
1!

𝑥 − 𝑥0 +
𝑓′′ 𝑥0
2!

𝑥 − 𝑥0
2+
𝑓(3) 𝑥0
3!

𝑥 − 𝑥0
3+⋯

Taylor series approximation for a single variable function

Approximation is valid only in the neighborhood of 𝑥0

Higher-order
terms

Taylor Series Approximation

𝑓 𝑥 = 𝑒𝑥

𝑓 𝑥 ≈ 𝑓 𝑥0 + 𝑓′(𝑥0)(𝑥 − 𝑥0)

for 𝑥0 = 0

𝑓 𝑥 ≈ 1 + 𝑥

Linear Approximation

Taylor Series Approximation

𝑓 𝑥 = 𝑒𝑥
𝑓 𝑥 ≈ 𝑓 𝑥0 + 𝑓

′ 𝑥0 𝑥 − 𝑥0

+
1

2
𝑓” 𝑥0 𝑥 − 𝑥0

for 𝑥0 = 0

𝑓 𝑥 ≈ 1 + 𝑥 +
1

2
𝑥2

Quadratic Approximation

Taylor Series Approximation

In higher dimensions

𝑓 𝐱 ≈ 𝑓 𝐱0 + 𝛻𝑓 𝐱0 𝐱 − 𝐱0 +
1

2
𝐱 − 𝐱0

T𝐇(𝐱0) 𝐱 − 𝐱0 +⋯

Gradient vector

𝛻𝑓 𝐱 =
𝜕𝑓
𝜕𝑥1

,
𝜕𝑓
𝜕𝑥2

, … ,
𝜕𝑓
𝜕𝑥𝑛

Hessian matrix

𝐇(𝐱) =

𝜕2𝑓

𝜕𝑥1
2 …

𝜕2𝑓
𝜕𝑥1𝑥𝑛

⋮ ⋱ ⋮

𝜕2𝑓

𝜕𝑥𝑛𝑥1
…

𝜕2𝑓
𝜕𝑥𝑛
2

Recall:

Taylor Series Approximation

𝜕𝑓 𝜕𝐱

𝜕𝑓 ≈ 𝛻𝑓 𝐱0 𝜕𝐱 +
1

2
𝜕𝐱T𝐇(𝐱0)𝜕𝐱

𝑓 𝐱 ≈ 𝑓 𝐱0 + 𝛻𝑓 𝐱0 𝐱 − 𝐱0 +
1

2
𝐱 − 𝐱0

T𝐇(𝐱0) 𝐱 − 𝐱0

𝑓 𝐱 − 𝑓 𝐱0 ≈ 𝛻𝑓 𝐱0 𝐱 − 𝐱0 +
1

2
𝐱 − 𝐱0

T𝐇(𝐱0) 𝐱 − 𝐱0

( Perturbation in 𝑓 ) ( Perturbation in 𝐱 )

We will use this to
develop our first
two algorithms!

Basic gradient-based
algorithms

1st-order: Gradient descent

2nd-order: Newton’s method

1st-order algorithm: Gradient descent

Starting at a point 𝐱0, we want to find a direction
that will lower the objective value

1. Using 1st-order terms only from Taylor approx.,

We want 𝜕𝑓 < 0. If we choose 𝜕𝐱 = −𝛻𝑓 𝐱0 , then we know 𝜕𝑓 ≈ − 𝛻𝑓 𝐱0 2 < 0 This is our direction of descent! 27 𝜕𝑓 ≈ 𝛻𝑓 𝐱0 𝜕𝐱 Gradient Method 𝜕𝐱𝑘 = − 𝛻𝑓 𝐱𝑘 𝑇 = −𝐠𝑘 𝜕𝐱𝑘 = 𝐱𝑘+1 − 𝐱𝑘 𝐱𝑘+1 = 𝐱𝑘 − 𝐠𝑘 𝑓(𝑥) gradient (not constraint) 28 Gradient descent algorithm Local optimization algorithm for interior optima 1. Begin with a feasible point 𝐱0 2. Find the gradient at that point 𝛁𝑓 𝐱0 3. Move in the direction of the negative gradient to find an improved 𝐱1 𝐱1 = 𝐱0 − 𝛁𝑓 𝐱0 4. Continue to iterate until you stop improving 𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1 29 Gradient descent Slight modification: add a scale factor to avoid jumping past the minimum: 𝐱𝑘 = 𝐱𝑘−1 − 𝛼𝛁𝑓 𝐱𝑘−1 Optimizing the step size 𝛼 gives us: 𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓𝑇 𝐱𝑘−1 𝛁𝑓 𝐱𝑘−1 𝛁𝑓𝑇 𝐱𝑘−1 𝐇(𝐱𝑘−1)𝛁𝑓 𝐱𝑘−1 𝛁𝑓 𝐱𝑘−1 provided the Hessian is positive-definite 30 p. 155 of 2nd edition or p. 189 of 3rd edition explains step size optimization Gradient descent example 31 Problem: Gradient & Hessian: Algorithm: Summary • The optimality conditions can be used to prove an interior optimum • The First-Order Necessary Condition identifies stationary points • The Second-Order Sufficiency Condition identifies the nature (minima, maxima, saddle) of stationary points • Taylor series approximation is used to generate derivative-based local optimization directions • The gradient descent algorithm uses 1st-order info • Newton’s method (algorithm) uses 2nd-order info, which we didn’t get to today… 32 Acknowledgements • Much of this material came from Chapter 4 of the textbook, Principles of Optimal Design • Some of the slides and examples came from Drs. Emrah Bayrak, Alex Burnap, and Namwoo Kang at the University of Michigan 33 Question Who would be interesting in a HW1 review session with Amineh sometime on Friday (9/28) or Monday (10/1)? ANSWER: There is interest! Amineh will host a session to review the HW1 solutions on Monday at 11am. Location is TBD, and an announcement will go out on Canvas soon. 34 Another problem set-up and monotonicity example Designing an air tank 35 r h l s Solution Air tank problem set-up minimize the volume of metal: 36 r h l s Example: Air tank add constraints: 37 r h l s minimum inner volume ASME code ratio limit ASME code ratio limit allow space to attach nozzles space limitation Okay, what’s next? Example: Air tank monotonicity analysis: 38 r h l s MP1: Every increasing variable (in the objective) is bounded below by at least one non-increasing active constraint Okay, what’s next? Example: Air tank Remove critical constraints: 39 Solve each for variable Plug into objective and constraints Infimum at 𝑉∗ = 130.1(103)𝜋, where 𝑙∗ = ∞: not well bounded

Related Posts