程序代写代做代考 algorithm matlab Design optimization algorithms and tools

Design optimization algorithms and tools

Constrained gradient-
based optimization

ME 564/SYS 564
Wed Oct 10, 2018
Steven Hoffenson

Goal of Week 7: To learn the optimality conditions for constrained problems, be able
to solve problems with them, and understand how some common algorithms work

1

Recap: How to optimize

1. Formulate the problem
a) Define system boundaries

b) Develop analytical models

c) Explore/reduce the problem space

d) Formalize optimization problem

2. Solve the problem
a) Choose the right approach/algorithm

b) Solve (by hand, code, or software)

c) Interpret the results

d) Iterate if needed

2

𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱0

(Weeks 1-3, 9-12)

(Weeks 4-8, 13)TODAY

Discuss: HW2 P3

3

Hooke-Jeeves
algorithm

Rosenbrock
function

Recap: Weeks 5 & 6 (and HW3)

• The optimality conditions can be used to solve for
or prove an interior optimum
• The First-Order Necessary Condition identifies

stationary points

• The Second-Order Sufficiency Condition identifies the
nature (minima, maxima, saddle) of stationary points

• Taylor series approximation is used to generate
derivative-based local optimization directions
• The gradient descent algorithm uses 1st-order info

• Newton’s method (algorithm) uses 2nd-order info

• Convexity can be used to prove a local optimum is
global

4

Recap: Gradient descent algorithm

Local optimization algorithm for interior optima

1. Begin with a feasible point 𝐱0
2. Find the gradient at that point 𝛁𝑓 𝐱0
3. Move in the direction of the negative gradient to

find an improved 𝐱1
𝐱1 = 𝐱0 − 𝛁𝑓 𝐱0

4. Continue to iterate until you stop improving

𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1

We can also add a scale factor α.
5

Recap: Newton’s method

Local optimization algorithm for interior optima

1. Begin with a feasible point 𝐱0
2. Find the gradient and Hessian that point

3. Move in the following way:

𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱𝑘

Similar to gradient descent, but multiply the Hessian
inverse by gradient. We can also add a scale factor α.

Note: This is very effective for quadratic objectives.

6

Recap: Convexity

7

If you can prove convexity, then any local optimum is a
global optimum!

Convex
Non-convex

Not convex
Local minima exist

Convex
Unique global minimum

If the Hessian of the objective function is positive definite
everywhere, then the problem is convex! This can help

you conclude that you have found a global solution.

HW3 tips
• Some versions of MATLAB do not allow functions in the

same file as the code. If my “week6_ex419gd_2018.m”
does not run on your computer, you will need to
separate this into 3 files (main script + 2 functions)!
This is similar to the fmincon beam example. Ask me if
you need help with this.

• Watch out for matrix dimensions. These algorithms are
written with the x’s and gradients as column vectors,
but with the x’s as a row vector in our code examples,
this can cause MATLAB problems. You cannot add a row
vector and a column vector, so you need to be
consistent or use transposes.

• Start early, and make an appointment with me if you
have questions. I want everyone to get an A on this HW.

8

Constrained gradient-
based optimization

9

Today’s topics

• Reduced gradient approach
• Extensions to FONC and SOSC

• Generalized Reduced Gradient (GRG) algorithm

• Active set strategies

• Lagrangian approach
• Lagrange multipliers

• Karush-Kuhn-Tucker (KKT) conditions

• Two common algorithms:
• Quasi-Newton methods

• Sequential Quadratic Programming (SQP)

10

Reduced gradient
An approach for using updated FONC and SOSC to
solve problems with equality (or active inequality)

constraints that cannot be substituted out numerically

11

Alone, Reduced Gradient is not an algorithm!
However, many common algorithms are built on

its premise.

`

Reduced gradient: Rationale

• In the past, with equality constraints or active
inequality constraints, we substituted variables out

min𝑓 = 𝑥1 + 𝑥2
s.t. ℎ = 𝑥1 − 1 = 0

𝑔 = −𝑥2 + 1 ≤ 0

• What if you cannot substitute?

min𝑓 = 𝑥1 + 𝑥2
s.t. ℎ = 𝑥1 − 𝑥2 + cos 𝑥1𝑥2 = 0

12

ℎ: 𝑥1
∗ = 1

𝑔: 𝑥2
∗ = 1

𝑓∗ = 2

equality

MP1

Because of this, we cannot isolate either variable!

Reduced gradient: Overview

1. Partition variables 𝐱 into sets of 𝐬 “state
variables” and 𝐝 “decision variables”

2. Take all partial derivatives:

𝜕𝑓/𝜕𝐬, 𝜕𝑓/𝜕𝐝, 𝜕𝐡/𝜕𝐬, 𝜕𝐡/𝜕d

3. Calculate reduced gradient 𝜕𝑧/𝜕𝐝 using Step 2,
and solve for 𝐝∗ by setting 𝜕𝑧/𝜕𝐝= 𝟎

4. Calculate 𝐬∗ from optimal 𝐝∗ using 𝐡, then
calculate 𝑓∗

13

Reduced gradient: Partition

• Since we have 𝑛 variables/unknowns and 𝑚
equations, we have 𝑛 −𝑚 degrees of freedom

• We can partition the 𝑛-dimensional vector 𝐱 into 𝑚
“state variables” and 𝑛 −𝑚 “decision variables”

I.e., 𝐱 = 𝑥1, 𝑥2, … 𝑥𝑛
𝑇 with 𝑚 < 𝑛 active constraints can be split into: 𝐬 = 𝑥1, 𝑥2, … 𝑥𝑚 𝑇 𝐝 = 𝑥𝑚+1, 𝑥𝑚+2, … 𝑥𝑛 𝑇 • The choice of which variables are state and which are decision does not matter – you should choose which will be easiest computationally 14 state variables decision variables Reduced gradient: Re-frame 𝜕𝐡 • Separate 𝐬 and 𝐝 in framing of 𝜕𝐡: • In vector form: where, 15 Reduced gradient: Solve for 𝜕𝐬 Rearrange the separated equation to get: 16 This derivation gives us a very useful expression for 𝜕𝐬/𝜕𝐝 Reduced gradient: Re-frame objective Take the gradient of this “reduced” objective: Plugging in for our previously-derived 𝜕𝐬/𝜕𝐝: 17 This is the reduced gradient! Reduced gradient: Solve with FONC 1. From FONC, find stationary points: 2. Next, plug into formulation from before: 3. Then, from optimal 𝐝∗, find 𝐬∗ with 𝐡 𝐬∗, 𝐝∗ = 𝟎 4. Finally, from optimizers 𝐝∗ and 𝐬∗, calculate 𝑓∗ 18 Example 1: Reduced Gradient Step 1: Partition 𝐱 into 𝑛 −𝑚 decision variables 𝐝 and 𝑚 state variables 𝐬 Step 2: Formulate 𝜕𝑓/𝜕𝐬, 𝜕𝑓/𝜕𝐝, 𝜕𝐡/𝜕𝐬, 𝜕𝐡/𝜕d Step 3: Calculate reduced gradient 𝜕𝑧/𝜕𝐝 using Step 2, and solve for 𝜕𝑧/𝜕𝐝= 𝟎 Step 4: Solve for 𝐬∗ from 𝐝∗ using 𝐡, then find 𝑓∗ 19 Example 1: Reduced gradient 1. Since we have 1 active constraint and 2 variables, we should only have 2-1 = 1 degree of freedom. Partition: 𝑠 = 𝑥1, 𝑑 = 𝑥2 2. Formulate: 𝑓 = 𝑠 − 2 2 + 𝑑 − 2 2 ℎ = 𝑠2 + 𝑑2 − 4 = 0 3. Calculate: 𝜕𝑧 𝜕𝑑 = 𝜕𝑓 𝜕𝑑 − 𝜕𝑓 𝜕𝑠 𝜕ℎ 𝜕𝑠 −1 𝜕ℎ 𝜕𝑑 𝜕𝑧 𝜕𝑑 = 2𝑑 − 4 − 2𝑠 − 4 2𝑠 −1 2𝑑 20 Example 1: Reduced gradient 𝜕𝑧 𝜕𝑑 = 2𝑑 − 4 − 2𝑠 − 4 2𝑠 −1 2𝑑 Reduce: 𝜕𝑧 𝜕𝑑 = 2𝑑 − 4 − 2𝑑 − 4 𝑑 𝑠 𝜕𝑧 𝜕𝑑 = 4 𝑑 𝑠 − 4 Solve using FONC, 𝜕𝑧 𝜕𝑑 = 0: 4 𝑑 𝑠 − 4 = 0  𝑑 𝑠 = 1  𝑑 = 𝑠 21 Example 1: Reduced gradient 4. Now that we know 𝑑 = 𝑠, plug back into ℎ: ℎ = 𝑠2 + 𝑠2 − 4 = 0 2𝑠2 = 4  𝑠∗ = 2 = 𝑑∗ Plug back into 𝑓: 𝑓∗ = 2 − 2 2 + 2 − 2 2 = 2 2 − 4 2 + 4 𝑓∗ = 12 − 8 2 ≈ 0.6863 22 Note: This is just an updated version of the FONC and SOSC solution approach! Generalized Reduced Gradient (GRG) GRG is an iterative algorithm similar to gradient descent, but using the Reduced Gradient concept In each iteration, update 𝐱𝑘 = 𝐝𝑘 , 𝐬𝑘 : First, update decision variables in each iteration 𝐝𝑘+1 = 𝐝𝑘 − 𝛼𝑘 𝜕𝑧 𝜕𝐝 𝑘 𝑇 and then adjust the state variables by starting with 𝐬′𝑘+1 = 𝐬𝑘 + 𝛼𝑘 𝜕𝐡 𝜕𝐬 𝑘 −1 𝜕𝐡 𝜕𝐝 𝑘 𝜕𝑧 𝜕𝐝 𝑘 𝑇 and updating it with the following until convergence: 𝐬𝑘+1 𝑗+1 = 𝐬𝑘+1 − 𝜕𝐡 𝜕𝐬 𝑘+1 −1 𝐡 𝐝𝑘+1, 𝐬𝑘+1 𝑗 23 Active set strategy • Recall that the Reduced Gradient approach only works with active constraints, or when we know which constraints are active • We can get around this with an active set strategy • An active set strategy is an algorithmic approach that assumes a set of active constraints and updates them. In each iteration, it will: • Remove inactive constraints • Add violated constraints 24 This is similar to the LP algorithm! Lagrange multipliers Another approach to dealing with constrained problems, which can be useful in algorithms 25 Lagrange multipliers Using FONC on Reduced Gradient formulation 𝜕𝑧 𝜕𝐝 = 𝜕𝑓 𝜕𝐝 − 𝜕𝑓 𝜕𝐬 𝜕𝐡 𝜕𝐬 −1 𝜕𝐡 𝜕𝐝 = 𝟎 we redefine term in red as a new set of “variables” 𝜕𝑧 𝜕𝐝 = 𝜕𝑓 𝜕𝐝 + 𝝀𝑇 𝜕𝐡 𝜕𝐝 = 𝟎 𝝀𝑇 = − 𝜕𝑓 𝜕𝐬 𝜕𝐡 𝜕𝐬 −1 26 The elements of this vector are called Lagrange multipliers Lagrange multipliers • Rearrange: 𝝀𝑇 = − 𝜕𝑓 𝜕𝐬 𝜕𝐡 𝜕𝐬 −1 𝜕𝑓 𝜕𝐬 + 𝝀𝑇 𝜕𝐡 𝜕𝐬 = 𝟎𝑇 • Recall that we still have Reduced Gradient FONC: 𝜕𝑓 𝜕𝐝 + 𝝀𝑇 𝜕𝐡 𝜕𝐝 = 𝟎𝑇 • We can combine these, since 𝐱 = 𝐬, 𝐝 : 𝜕𝑓 𝜕𝐱 + 𝝀𝑇 𝜕𝐡 𝜕𝐱 = 𝟎𝑇 27 𝑛 equations To solve, we need more equations, since 𝝀𝑇 introduces more “variables”: 𝐡 𝐱 = 𝟎 𝑚 equations Lagrangian defined • We define the Lagrangian as: ℒ 𝐱, 𝝀 = 𝑓 𝐱 + 𝝀𝑇𝐡 𝐱 • Which yields the equations we just discussed: 𝜕ℒ 𝜕𝐱 = 𝟎: 𝜕𝑓 𝜕𝐱 + 𝝀𝑇 𝜕𝐡 𝜕𝐱 = 𝟎𝑇 𝜕ℒ 𝜕𝝀 = 𝟎: 𝐡 𝐱 = 𝟎 • With the Hessian of the Lagrangian, ℒ𝑥𝑥, and a FONC point 𝐱∗: 𝜕𝐱∗ 𝑇ℒ𝑥𝑥 𝜕𝐱 ∗ > 0

𝜕𝐡

𝜕𝐱
𝜕𝐱∗ = 0

28

These are the FONCs
for a strictly-
constrained problem!

SOSCs for a strictly-
constrained problem!

Solving with Lagrange multipliers

1. Define ℒ 𝐱, 𝝀 = 𝑓 𝐱 + 𝝀𝑇𝐡 𝐱

2. Find 𝐱∗ using FONC:
𝜕ℒ

𝜕𝐱
= 𝟎 and

𝜕ℒ

𝜕𝝀
= 𝟎

3. Check SOSCs
a) With Hessian ℒ𝑥𝑥 at 𝐱

∗, 𝜕𝐱∗ 𝑇ℒ𝑥𝑥 𝜕𝐱
∗ > 0

b)
𝜕𝐡

𝜕𝐱
𝜕𝐱∗ = 0

29

Example: Lagrange Multipliers

1. Define ℒ 𝐱, 𝝀 = 𝑓 𝐱 + 𝝀𝑇𝐡 𝐱 :

ℒ 𝐱, 𝝀 = −𝑥1𝑥2 − 𝑥2𝑥3 − 𝑥1𝑥3 + 𝜆𝑥1 + 𝜆𝑥2 + 𝜆𝑥3 − 3𝜆

2. Find 𝐱∗ using FONC:
𝜕ℒ

𝜕𝐱
= 𝟎 and 𝐡 𝐱 = 𝟎

𝜕ℒ

𝜕𝑥1
= −𝑥2 − 𝑥3 + 𝜆 = 0

𝜕ℒ

𝜕𝑥2
= −𝑥1 − 𝑥3 + 𝜆 = 0

𝜕ℒ

𝜕𝑥3
= −𝑥1 − 𝑥2 + 𝜆 = 0

ℎ 𝐱 = 𝑥1 + 𝑥2 + 𝑥3 − 3 = 0

𝑥1
∗ = 1

𝑥2
∗ = 1

𝑥3
∗ = 1

𝜆∗ = 2

𝑓∗ = 3

Remember to convert to a
minimization problem!

30

Example: Lagrange Multipliers

3. With Hessian ℒ𝑥𝑥 at 𝐱
∗, 𝜕𝐱∗ 𝑇ℒ𝑥𝑥 𝜕𝐱

∗ > 0

4.
𝜕𝐡

𝜕𝐱
𝜕𝐱∗ = 0

ℒ𝑥𝑥 =
0 −1 −1
−1 0 −1
−1 −1 0

𝜕𝑥1 𝜕𝑥2 𝜕𝑥3

0 −1 −1
−1 0 −1
−1 −1 0

𝜕𝑥1
𝜕𝑥2
𝜕𝑥3

= −2(𝜕𝑥1𝜕𝑥2 + 𝜕𝑥2𝜕𝑥3 + 𝜕𝑥1𝜕𝑥3) > 0

Since
𝜕𝐡

𝜕𝐱
𝜕𝐱∗ = 𝜕𝑥1 + 𝜕𝑥2 + 𝜕𝑥3 = 0  2 𝜕𝑥1 +

𝜕𝑥2

2

2
+

3

4
𝜕𝑥2

2
> 0

?

Therefore, 𝐱∗ = 1,1,1 𝑇 is a global minimum 31

Karush-Kuhn-Tucker
(KKT) conditions

FONC for problems with both equality and
inequality constraints, regardless of activity

32

Lagrangian with inequalities

33

min
𝐱

𝑓(𝐱)

s.t. 𝐡(𝐱) = 0

𝐠(𝐱) ≤ 0

min
𝐱,𝛌,𝛍

𝑓(𝐱) + 𝛌𝑇𝐡 𝐱 + 𝛍𝑇𝐠(𝐱)

Original
Constrained Problem

New Lagrangian

ℒ 𝐱, 𝛌, 𝛍 = 𝑓(𝐱) + 𝛌𝑇𝐡 𝐱 + 𝛍𝑇𝐠(𝐱)

Now both 𝛌 and 𝛍 are vectors
of Lagrange multipliers!

Karush-Kuhn-Tucker (KKT) conditions

First order necessary conditions for constrained problems

1. 𝛁𝐱ℒ = 𝛁𝑓 𝐱
∗ + 𝛌𝑇𝛁𝐡 𝐱∗ + 𝛍𝑇𝛁𝐠 𝐱∗ = 𝟎𝑻

2. 𝐡 𝐱∗ = 𝟎, 𝐠 𝐱∗ ≤ 𝟎

3. 𝛌 ≠ 𝟎, 𝛍 ≥ 𝟎

4. 𝝁T𝐠 𝐱∗ = 𝟎

34

Note: A design point 𝐱∗satisfying these 4 conditions is
called a KKT point, which may or may not be a minimizer

Second Order Sufficiency Conditions

Given a KKT point 𝐱∗, if the Hessian of the Lagrangian
in feasible directions is positive definite, then the
point is a local minimizer:

A feasible direction must satisfy:

𝜕𝐡

𝜕𝐱
𝜕𝐱∗ = 0

𝜕𝐠

𝜕𝐱
𝜕𝐱∗ = 0

𝜕𝐱∗
𝑻
ℒ𝒙𝒙𝜕𝐱∗ > 0

35

Example: KKT Conditions

Step 1: 𝛁𝐱ℒ = 𝛁𝑓 𝐱
∗ + 𝛌𝑇𝛁𝐡 𝐱∗ + 𝛍𝑇𝛁𝐠 𝐱∗ = 𝟎𝑻

𝛁𝑓 𝐱 =
16𝑥1 − 8𝑥2
−8𝑥1 + 6𝑥2

, 𝛁𝐠 𝐱 =
1 −4
−1 2

,

𝛁𝐡 𝐱 = [ ]

𝛁𝐱ℒ =
16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2

−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2
= 0

36

Example: KKT Conditions

37

Step 2: FONC 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0

−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0

𝜇1 𝑥1 − 4𝑥2 + 3 = 0
𝜇2 −𝑥1 + 2𝑥2 = 0

𝛁𝐱ℒ =
16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2

−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2
= 0

To solve this, we can look at different scenarios for the bottom
two equations, depending on whether each 𝜇 is zero:

Scenario 1: 𝜇1 = 0, 𝜇2 = 0 Scenario 2: 𝜇1 = 0, 𝜇2 ≠ 0
Scenario 3: 𝜇1 ≠ 0, 𝜇2 = 0 Scenario 4: 𝜇1 ≠ 0, 𝜇2 ≠ 0

Example: KKT Conditions

38

Step 2: FONC 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0

−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0

𝜇1 𝑥1 − 4𝑥2 + 3 = 0
𝜇2 −𝑥1 + 2𝑥2 = 0

Scenario 1: 𝜇1 = 0, 𝜇2 = 0 Scenario 2: 𝜇1 = 0, 𝜇2 ≠ 0

16𝑥1 − 8𝑥2 = 0 16𝑥1 − 8𝑥2 − 𝜇2 = 0

−8𝑥1 + 6𝑥2 = 0 −8𝑥1 + 6𝑥2 + 2𝜇2 = 0

𝑔1: 𝑥1 − 4𝑥2 + 3 ≤ 0 −𝑥1 + 2𝑥2 = 0

𝑔2: −𝑥1 + 2𝑥2 ≤ 0 𝑔1: 𝑥1 − 4𝑥2 + 3 ≤ 0

𝑥1 = 𝑥2 = 0 𝑥1 = 𝑥2 = 0

 violates constraint 𝑔1  requires 𝜇2 = 0; not allowed

No Solution No solution

Example: KKT Conditions

39

Step 2: FONC 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0

−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0

𝜇1 𝑥1 − 4𝑥2 + 3 = 0
𝜇2 −𝑥1 + 2𝑥2 = 0

Scenario 3: 𝜇1 ≠ 0, 𝜇2 = 0 Scenario 4: 𝜇1 ≠ 0, 𝜇2 ≠ 0

16𝑥1 − 8𝑥2 + 𝜇1 = 0 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0

−8𝑥1 + 6𝑥2 − 4𝜇1 = 0 −8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0

𝑥1 − 4𝑥2 + 3 = 0 𝑥1 − 4𝑥2 + 3 = 0

𝑔2: −𝑥1 + 2𝑥2 ≤ 0 −𝑥1 + 2𝑥2 = 0

𝑥1 =
13

33
, 𝑥2 =

28

33
, 𝜇1 =

16

33
𝑥1 = 3, 𝑥2 =

3

2
, 𝜇1 =

57

2
, 𝜇2 =

129

2

 violates constraint 𝑔2
 No Solution KKT point!

Example: KKT Conditions

40

Step 3: Check SOSC

KKT point: 𝑥1 = 3, 𝑥2 =
3

2
, 𝜇1 =

57

2
, 𝜇2 =

129

2

ℒ𝒙𝒙 =
16 −8
−8 6

Since ℒ𝒙𝒙 is pos. def. everywhere, the unique KKT point

𝐱∗ =
3
1.5

is a global minimizer at 𝑓∗ = 42.75.

𝛁𝐱ℒ =
16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2

−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2
= 0

Example 2: KKT Conditions

Step 1: 𝛁𝐱ℒ = 𝛁𝑓 𝐱
∗ + 𝛌𝑇𝛁𝐡 𝐱∗ + 𝛍𝑇𝛁𝐠 𝐱∗ = 𝟎𝑻

𝛁𝑓 𝐱 =
−1
0

, 𝛁𝐠 𝐱 =
3 1 − 𝑥1

2 1
−1 0
0 −1

, 𝛁𝐡 𝐱 = [ ]

𝛁𝐱ℒ =
−1 + 3 1 − 𝑥1

2𝜇1 − 𝜇2
𝜇1 − 𝜇3

= 0

2 equations, 5 unknowns…

𝜇2 = 3 1 − 𝑥1
2𝜇1 − 1

𝜇1 = 𝜇3

41

Example: KKT Conditions

𝛍𝑇𝐠 𝐱∗ = 0

𝜇1 𝑥2 − 1 − 𝑥1
3 = 0

−𝜇2𝑥1 = 0

−𝜇3𝑥2 = 0

42

𝜇2 = 3 1 − 𝑥1
2𝜇1 − 1

𝜇1 = 𝜇3

Now we have 5 equations, 5 unknowns

Solution to 5 eqns:

𝑥1
∗ = 0

𝑥2
∗ = [0,1]

(anywhere in range)

𝜇1 = 0

𝜇2 = −1

𝜇3 = 0

This
violates
𝛍 ≥ 𝟎 !!

2 eqns from previous slide:

3 more
equations

Example 2: What went wrong?

Important caveat: The approaches in Chapter 5 assume
that any KKT point found is a “regular point”, where the
active constraints are all linearly independent

43

The constraints are not
linearly independent,
causing an inability to
find a valid KKT point

Quasi-Newton and SQP
algorithms

Two common algorithms that use derivatives

44

Quasi-Newton methods
𝐇 and 𝐇−𝟏 are difficult to compute, so we can avoid
them by approximating 𝐇 or 𝐇−𝟏 using only first
derivatives:

Example update function: Davidon-Fletcher-Powell (DFP)

45For the derivation of this, see Ch. 6

Sequential quadratic programming (SQP)

1. Initialize problem

2. Solve the QP sub-problem to determine search
direction 𝐬𝑘

3. Set 𝐱𝑘+1 = 𝐱𝑘 + 𝐬𝑘
4. Continue (2) and (3) until satisfied

46
This is one of the most common, robust algorithms

Constrained problems
We can handle constraints in a number of ways

1. Lagrange multipliers:

min 𝐿 𝐱, 𝛌, 𝛍 = 𝑓 𝐱 + 𝛌𝑇𝐡 𝐱 + 𝛍𝑇𝐠 𝐱

2. Penalty function: add to the objective such that
it is zero on the interior and positive when
violating a constraint

min 𝑇 𝐱 = 𝑓 𝐱 + 𝑗=1
𝑚 max 0, 𝑔𝑗(𝐱)

2

3. Barrier function: add to the objective such that it
gets really high at the constraint boundary

min 𝑇 𝐱 = 𝑓 𝐱 − 𝑗=1
𝑚 ln −𝑔(𝐱)

47

Gradient-based approaches summary
Using derivatives to find a local minimum of a

differentiable function (or surrogate model)

• Unconstrained
• FONC and SOSC (when math is simple enough to solve)
• Gradient descent (algorithm with linear convergence)
• Newton method (algorithm with hyper-linear convergence)

• Constrained
• Reduced gradient (analytical with known active constraints)
• Generalized Reduced Gradient (algorithm w active constr)
• Active set strategy (algorithm w updating set of active constr)
• Lagrangian (equality or active inequality constr)
• KKT conditions (with any inequality and equality constr)
• Quasi-Newton methods (2nd-derivative-free)
• SQP (efficiently handles constraints)

48

Acknowledgements

• Much of this material came from Chapters 5 & 6 of
the textbook, Principles of Optimal Design

• Some of these slides and examples came from Drs.
Emrah Bayrak and Namwoo Kang at the University
of Michigan

49

Announcements

• HW3 is due on Tuesday at noon

• My office hours are Monday 2-4pm

• I will be available for appointments on Thursday
and Friday of this week, so start early! I am happy
to answer specific questions about your work or
code over email. I probably will not respond over
the weekend.

• Project progress reports are due in 2.5 weeks!
There is a template online that you should use.

50

Exercises

51

Exercise (optional): Lagrange
Multipliers

52

Exercise (optional): Gradient descent

53

Apply gradient descent in MATLAB to the Rosenbrock
function:

𝑓 𝑥, 𝑦 = 𝑥 − 1 2 + 100 𝑦 − 𝑥2
2

1. Begin with a feasible point 𝐱0
2. Find the gradient at that point 𝛁𝑓 𝐱0
3. Move in the direction of the negative gradient to

find an improved 𝐱1
𝐱1 = 𝐱0 − 𝛼𝛁𝑓 𝐱0

4. Continue to iterate until you stop improving

𝐱𝑘 = 𝐱𝑘−1 −
𝛁𝑓𝑇 𝐱𝑘−1 𝛁𝑓 𝐱𝑘−1

𝛁𝑓𝑇 𝐱𝑘−1 𝐇(𝐱𝑘−1)𝛁𝑓 𝐱𝑘−1
𝛁𝑓 𝐱𝑘−1