Design optimization algorithms and tools
Constrained gradient-
based optimization
ME 564/SYS 564
Wed Oct 10, 2018
Steven Hoffenson
Goal of Week 7: To learn the optimality conditions for constrained problems, be able
to solve problems with them, and understand how some common algorithms work
1
Recap: How to optimize
1. Formulate the problem
a) Define system boundaries
b) Develop analytical models
c) Explore/reduce the problem space
d) Formalize optimization problem
2. Solve the problem
a) Choose the right approach/algorithm
b) Solve (by hand, code, or software)
c) Interpret the results
d) Iterate if needed
2
𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱0
(Weeks 1-3, 9-12)
(Weeks 4-8, 13)TODAY
Discuss: HW2 P3
3
Hooke-Jeeves
algorithm
Rosenbrock
function
Recap: Weeks 5 & 6 (and HW3)
• The optimality conditions can be used to solve for
or prove an interior optimum
• The First-Order Necessary Condition identifies
stationary points
• The Second-Order Sufficiency Condition identifies the
nature (minima, maxima, saddle) of stationary points
• Taylor series approximation is used to generate
derivative-based local optimization directions
• The gradient descent algorithm uses 1st-order info
• Newton’s method (algorithm) uses 2nd-order info
• Convexity can be used to prove a local optimum is
global
4
Recap: Gradient descent algorithm
Local optimization algorithm for interior optima
1. Begin with a feasible point 𝐱0
2. Find the gradient at that point 𝛁𝑓 𝐱0
3. Move in the direction of the negative gradient to
find an improved 𝐱1
𝐱1 = 𝐱0 − 𝛁𝑓 𝐱0
4. Continue to iterate until you stop improving
𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1
We can also add a scale factor α.
5
Recap: Newton’s method
Local optimization algorithm for interior optima
1. Begin with a feasible point 𝐱0
2. Find the gradient and Hessian that point
3. Move in the following way:
𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱𝑘
Similar to gradient descent, but multiply the Hessian
inverse by gradient. We can also add a scale factor α.
Note: This is very effective for quadratic objectives.
6
Recap: Convexity
7
If you can prove convexity, then any local optimum is a
global optimum!
Convex
Non-convex
Not convex
Local minima exist
Convex
Unique global minimum
If the Hessian of the objective function is positive definite
everywhere, then the problem is convex! This can help
you conclude that you have found a global solution.
HW3 tips
• Some versions of MATLAB do not allow functions in the
same file as the code. If my “week6_ex419gd_2018.m”
does not run on your computer, you will need to
separate this into 3 files (main script + 2 functions)!
This is similar to the fmincon beam example. Ask me if
you need help with this.
• Watch out for matrix dimensions. These algorithms are
written with the x’s and gradients as column vectors,
but with the x’s as a row vector in our code examples,
this can cause MATLAB problems. You cannot add a row
vector and a column vector, so you need to be
consistent or use transposes.
• Start early, and make an appointment with me if you
have questions. I want everyone to get an A on this HW.
8
Constrained gradient-
based optimization
9
Today’s topics
• Reduced gradient approach
• Extensions to FONC and SOSC
• Generalized Reduced Gradient (GRG) algorithm
• Active set strategies
• Lagrangian approach
• Lagrange multipliers
• Karush-Kuhn-Tucker (KKT) conditions
• Two common algorithms:
• Quasi-Newton methods
• Sequential Quadratic Programming (SQP)
10
Reduced gradient
An approach for using updated FONC and SOSC to
solve problems with equality (or active inequality)
constraints that cannot be substituted out numerically
11
Alone, Reduced Gradient is not an algorithm!
However, many common algorithms are built on
its premise.
`
Reduced gradient: Rationale
• In the past, with equality constraints or active
inequality constraints, we substituted variables out
min𝑓 = 𝑥1 + 𝑥2
s.t. ℎ = 𝑥1 − 1 = 0
𝑔 = −𝑥2 + 1 ≤ 0
• What if you cannot substitute?
min𝑓 = 𝑥1 + 𝑥2
s.t. ℎ = 𝑥1 − 𝑥2 + cos 𝑥1𝑥2 = 0
12
ℎ: 𝑥1
∗ = 1
𝑔: 𝑥2
∗ = 1
𝑓∗ = 2
equality
MP1
Because of this, we cannot isolate either variable!
Reduced gradient: Overview
1. Partition variables 𝐱 into sets of 𝐬 “state
variables” and 𝐝 “decision variables”
2. Take all partial derivatives:
𝜕𝑓/𝜕𝐬, 𝜕𝑓/𝜕𝐝, 𝜕𝐡/𝜕𝐬, 𝜕𝐡/𝜕d
3. Calculate reduced gradient 𝜕𝑧/𝜕𝐝 using Step 2,
and solve for 𝐝∗ by setting 𝜕𝑧/𝜕𝐝= 𝟎
4. Calculate 𝐬∗ from optimal 𝐝∗ using 𝐡, then
calculate 𝑓∗
13
Reduced gradient: Partition
• Since we have 𝑛 variables/unknowns and 𝑚
equations, we have 𝑛 −𝑚 degrees of freedom
• We can partition the 𝑛-dimensional vector 𝐱 into 𝑚
“state variables” and 𝑛 −𝑚 “decision variables”
I.e., 𝐱 = 𝑥1, 𝑥2, … 𝑥𝑛
𝑇 with 𝑚 < 𝑛 active
constraints can be split into:
𝐬 = 𝑥1, 𝑥2, … 𝑥𝑚
𝑇
𝐝 = 𝑥𝑚+1, 𝑥𝑚+2, … 𝑥𝑛
𝑇
• The choice of which variables are state and which
are decision does not matter – you should choose
which will be easiest computationally
14
state variables
decision
variables
Reduced gradient: Re-frame 𝜕𝐡
• Separate 𝐬 and 𝐝 in framing of 𝜕𝐡:
• In vector form:
where,
15
Reduced gradient: Solve for 𝜕𝐬
Rearrange the separated equation
to get:
16
This derivation gives us a very useful expression for 𝜕𝐬/𝜕𝐝
Reduced gradient: Re-frame objective
Take the gradient of this “reduced” objective:
Plugging in for our previously-derived 𝜕𝐬/𝜕𝐝:
17
This is the reduced gradient!
Reduced gradient: Solve with FONC
1. From FONC, find stationary points:
2. Next, plug into formulation from before:
3. Then, from optimal 𝐝∗, find 𝐬∗ with 𝐡 𝐬∗, 𝐝∗ = 𝟎
4. Finally, from optimizers 𝐝∗ and 𝐬∗, calculate 𝑓∗
18
Example 1: Reduced Gradient
Step 1: Partition 𝐱 into 𝑛 −𝑚 decision variables 𝐝
and 𝑚 state variables 𝐬
Step 2: Formulate 𝜕𝑓/𝜕𝐬, 𝜕𝑓/𝜕𝐝, 𝜕𝐡/𝜕𝐬, 𝜕𝐡/𝜕d
Step 3: Calculate reduced gradient 𝜕𝑧/𝜕𝐝 using Step
2, and solve for 𝜕𝑧/𝜕𝐝= 𝟎
Step 4: Solve for 𝐬∗ from 𝐝∗ using 𝐡, then find 𝑓∗
19
Example 1: Reduced gradient
1. Since we have 1 active constraint and 2 variables,
we should only have 2-1 = 1 degree of freedom.
Partition: 𝑠 = 𝑥1, 𝑑 = 𝑥2
2. Formulate: 𝑓 = 𝑠 − 2 2 + 𝑑 − 2 2
ℎ = 𝑠2 + 𝑑2 − 4 = 0
3. Calculate:
𝜕𝑧
𝜕𝑑
=
𝜕𝑓
𝜕𝑑
−
𝜕𝑓
𝜕𝑠
𝜕ℎ
𝜕𝑠
−1 𝜕ℎ
𝜕𝑑
𝜕𝑧
𝜕𝑑
= 2𝑑 − 4 − 2𝑠 − 4 2𝑠 −1 2𝑑
20
Example 1: Reduced gradient
𝜕𝑧
𝜕𝑑
= 2𝑑 − 4 − 2𝑠 − 4 2𝑠 −1 2𝑑
Reduce:
𝜕𝑧
𝜕𝑑
= 2𝑑 − 4 − 2𝑑 − 4
𝑑
𝑠
𝜕𝑧
𝜕𝑑
= 4
𝑑
𝑠
− 4
Solve using FONC,
𝜕𝑧
𝜕𝑑
= 0:
4
𝑑
𝑠
− 4 = 0
𝑑
𝑠
= 1 𝑑 = 𝑠
21
Example 1: Reduced gradient
4. Now that we know 𝑑 = 𝑠, plug back into ℎ:
ℎ = 𝑠2 + 𝑠2 − 4 = 0
2𝑠2 = 4 𝑠∗ = 2 = 𝑑∗
Plug back into 𝑓:
𝑓∗ = 2 − 2
2
+ 2 − 2
2
= 2 2 − 4 2 + 4
𝑓∗ = 12 − 8 2 ≈ 0.6863
22
Note: This is just an updated version of the FONC and SOSC solution approach!
Generalized Reduced Gradient (GRG)
GRG is an iterative algorithm similar to gradient
descent, but using the Reduced Gradient concept
In each iteration, update 𝐱𝑘 = 𝐝𝑘 , 𝐬𝑘 :
First, update decision variables in each iteration
𝐝𝑘+1 = 𝐝𝑘 − 𝛼𝑘
𝜕𝑧
𝜕𝐝 𝑘
𝑇
and then adjust the state variables by starting with
𝐬′𝑘+1 = 𝐬𝑘 + 𝛼𝑘
𝜕𝐡
𝜕𝐬 𝑘
−1
𝜕𝐡 𝜕𝐝 𝑘
𝜕𝑧 𝜕𝐝 𝑘
𝑇
and updating it with the following until convergence:
𝐬𝑘+1 𝑗+1 = 𝐬𝑘+1 −
𝜕𝐡
𝜕𝐬 𝑘+1
−1
𝐡 𝐝𝑘+1, 𝐬𝑘+1
𝑗
23
Active set strategy
• Recall that the Reduced Gradient approach only
works with active constraints, or when we know
which constraints are active
• We can get around this with an active set strategy
• An active set strategy is an algorithmic approach
that assumes a set of active constraints and
updates them. In each iteration, it will:
• Remove inactive constraints
• Add violated constraints
24
This is similar to the LP algorithm!
Lagrange multipliers
Another approach to dealing with constrained
problems, which can be useful in algorithms
25
Lagrange multipliers
Using FONC on Reduced Gradient formulation
𝜕𝑧
𝜕𝐝
=
𝜕𝑓
𝜕𝐝
−
𝜕𝑓
𝜕𝐬
𝜕𝐡
𝜕𝐬
−1 𝜕𝐡
𝜕𝐝
= 𝟎
we redefine term in red as a new set of “variables”
𝜕𝑧
𝜕𝐝
=
𝜕𝑓
𝜕𝐝
+ 𝝀𝑇
𝜕𝐡
𝜕𝐝
= 𝟎
𝝀𝑇 = −
𝜕𝑓
𝜕𝐬
𝜕𝐡
𝜕𝐬
−1
26
The elements of this vector are called
Lagrange multipliers
Lagrange multipliers
• Rearrange:
𝝀𝑇 = −
𝜕𝑓
𝜕𝐬
𝜕𝐡
𝜕𝐬
−1
𝜕𝑓
𝜕𝐬
+ 𝝀𝑇
𝜕𝐡
𝜕𝐬
= 𝟎𝑇
• Recall that we still have Reduced Gradient FONC:
𝜕𝑓
𝜕𝐝
+ 𝝀𝑇
𝜕𝐡
𝜕𝐝
= 𝟎𝑇
• We can combine these, since 𝐱 = 𝐬, 𝐝 :
𝜕𝑓
𝜕𝐱
+ 𝝀𝑇
𝜕𝐡
𝜕𝐱
= 𝟎𝑇
27
𝑛 equations
To solve, we need more equations,
since 𝝀𝑇 introduces more “variables”:
𝐡 𝐱 = 𝟎
𝑚 equations
Lagrangian defined
• We define the Lagrangian as:
ℒ 𝐱, 𝝀 = 𝑓 𝐱 + 𝝀𝑇𝐡 𝐱
• Which yields the equations we just discussed:
𝜕ℒ
𝜕𝐱
= 𝟎:
𝜕𝑓
𝜕𝐱
+ 𝝀𝑇
𝜕𝐡
𝜕𝐱
= 𝟎𝑇
𝜕ℒ
𝜕𝝀
= 𝟎: 𝐡 𝐱 = 𝟎
• With the Hessian of the Lagrangian, ℒ𝑥𝑥, and a
FONC point 𝐱∗:
𝜕𝐱∗ 𝑇ℒ𝑥𝑥 𝜕𝐱
∗ > 0
𝜕𝐡
𝜕𝐱
𝜕𝐱∗ = 0
28
These are the FONCs
for a strictly-
constrained problem!
SOSCs for a strictly-
constrained problem!
Solving with Lagrange multipliers
1. Define ℒ 𝐱, 𝝀 = 𝑓 𝐱 + 𝝀𝑇𝐡 𝐱
2. Find 𝐱∗ using FONC:
𝜕ℒ
𝜕𝐱
= 𝟎 and
𝜕ℒ
𝜕𝝀
= 𝟎
3. Check SOSCs
a) With Hessian ℒ𝑥𝑥 at 𝐱
∗, 𝜕𝐱∗ 𝑇ℒ𝑥𝑥 𝜕𝐱
∗ > 0
b)
𝜕𝐡
𝜕𝐱
𝜕𝐱∗ = 0
29
Example: Lagrange Multipliers
1. Define ℒ 𝐱, 𝝀 = 𝑓 𝐱 + 𝝀𝑇𝐡 𝐱 :
ℒ 𝐱, 𝝀 = −𝑥1𝑥2 − 𝑥2𝑥3 − 𝑥1𝑥3 + 𝜆𝑥1 + 𝜆𝑥2 + 𝜆𝑥3 − 3𝜆
2. Find 𝐱∗ using FONC:
𝜕ℒ
𝜕𝐱
= 𝟎 and 𝐡 𝐱 = 𝟎
𝜕ℒ
𝜕𝑥1
= −𝑥2 − 𝑥3 + 𝜆 = 0
𝜕ℒ
𝜕𝑥2
= −𝑥1 − 𝑥3 + 𝜆 = 0
𝜕ℒ
𝜕𝑥3
= −𝑥1 − 𝑥2 + 𝜆 = 0
ℎ 𝐱 = 𝑥1 + 𝑥2 + 𝑥3 − 3 = 0
𝑥1
∗ = 1
𝑥2
∗ = 1
𝑥3
∗ = 1
𝜆∗ = 2
𝑓∗ = 3
Remember to convert to a
minimization problem!
30
Example: Lagrange Multipliers
3. With Hessian ℒ𝑥𝑥 at 𝐱
∗, 𝜕𝐱∗ 𝑇ℒ𝑥𝑥 𝜕𝐱
∗ > 0
4.
𝜕𝐡
𝜕𝐱
𝜕𝐱∗ = 0
ℒ𝑥𝑥 =
0 −1 −1
−1 0 −1
−1 −1 0
𝜕𝑥1 𝜕𝑥2 𝜕𝑥3
0 −1 −1
−1 0 −1
−1 −1 0
𝜕𝑥1
𝜕𝑥2
𝜕𝑥3
= −2(𝜕𝑥1𝜕𝑥2 + 𝜕𝑥2𝜕𝑥3 + 𝜕𝑥1𝜕𝑥3) > 0
Since
𝜕𝐡
𝜕𝐱
𝜕𝐱∗ = 𝜕𝑥1 + 𝜕𝑥2 + 𝜕𝑥3 = 0 2 𝜕𝑥1 +
𝜕𝑥2
2
2
+
3
4
𝜕𝑥2
2
> 0
?
Therefore, 𝐱∗ = 1,1,1 𝑇 is a global minimum 31
Karush-Kuhn-Tucker
(KKT) conditions
FONC for problems with both equality and
inequality constraints, regardless of activity
32
Lagrangian with inequalities
33
min
𝐱
𝑓(𝐱)
s.t. 𝐡(𝐱) = 0
𝐠(𝐱) ≤ 0
min
𝐱,𝛌,𝛍
𝑓(𝐱) + 𝛌𝑇𝐡 𝐱 + 𝛍𝑇𝐠(𝐱)
Original
Constrained Problem
New Lagrangian
ℒ 𝐱, 𝛌, 𝛍 = 𝑓(𝐱) + 𝛌𝑇𝐡 𝐱 + 𝛍𝑇𝐠(𝐱)
Now both 𝛌 and 𝛍 are vectors
of Lagrange multipliers!
Karush-Kuhn-Tucker (KKT) conditions
First order necessary conditions for constrained problems
1. 𝛁𝐱ℒ = 𝛁𝑓 𝐱
∗ + 𝛌𝑇𝛁𝐡 𝐱∗ + 𝛍𝑇𝛁𝐠 𝐱∗ = 𝟎𝑻
2. 𝐡 𝐱∗ = 𝟎, 𝐠 𝐱∗ ≤ 𝟎
3. 𝛌 ≠ 𝟎, 𝛍 ≥ 𝟎
4. 𝝁T𝐠 𝐱∗ = 𝟎
34
Note: A design point 𝐱∗satisfying these 4 conditions is
called a KKT point, which may or may not be a minimizer
Second Order Sufficiency Conditions
Given a KKT point 𝐱∗, if the Hessian of the Lagrangian
in feasible directions is positive definite, then the
point is a local minimizer:
A feasible direction must satisfy:
𝜕𝐡
𝜕𝐱
𝜕𝐱∗ = 0
𝜕𝐠
𝜕𝐱
𝜕𝐱∗ = 0
𝜕𝐱∗
𝑻
ℒ𝒙𝒙𝜕𝐱∗ > 0
35
Example: KKT Conditions
Step 1: 𝛁𝐱ℒ = 𝛁𝑓 𝐱
∗ + 𝛌𝑇𝛁𝐡 𝐱∗ + 𝛍𝑇𝛁𝐠 𝐱∗ = 𝟎𝑻
𝛁𝑓 𝐱 =
16𝑥1 − 8𝑥2
−8𝑥1 + 6𝑥2
, 𝛁𝐠 𝐱 =
1 −4
−1 2
,
𝛁𝐡 𝐱 = [ ]
𝛁𝐱ℒ =
16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2
−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2
= 0
36
Example: KKT Conditions
37
Step 2: FONC 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0
−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0
𝜇1 𝑥1 − 4𝑥2 + 3 = 0
𝜇2 −𝑥1 + 2𝑥2 = 0
𝛁𝐱ℒ =
16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2
−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2
= 0
To solve this, we can look at different scenarios for the bottom
two equations, depending on whether each 𝜇 is zero:
Scenario 1: 𝜇1 = 0, 𝜇2 = 0 Scenario 2: 𝜇1 = 0, 𝜇2 ≠ 0
Scenario 3: 𝜇1 ≠ 0, 𝜇2 = 0 Scenario 4: 𝜇1 ≠ 0, 𝜇2 ≠ 0
Example: KKT Conditions
38
Step 2: FONC 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0
−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0
𝜇1 𝑥1 − 4𝑥2 + 3 = 0
𝜇2 −𝑥1 + 2𝑥2 = 0
Scenario 1: 𝜇1 = 0, 𝜇2 = 0 Scenario 2: 𝜇1 = 0, 𝜇2 ≠ 0
16𝑥1 − 8𝑥2 = 0 16𝑥1 − 8𝑥2 − 𝜇2 = 0
−8𝑥1 + 6𝑥2 = 0 −8𝑥1 + 6𝑥2 + 2𝜇2 = 0
𝑔1: 𝑥1 − 4𝑥2 + 3 ≤ 0 −𝑥1 + 2𝑥2 = 0
𝑔2: −𝑥1 + 2𝑥2 ≤ 0 𝑔1: 𝑥1 − 4𝑥2 + 3 ≤ 0
𝑥1 = 𝑥2 = 0 𝑥1 = 𝑥2 = 0
violates constraint 𝑔1 requires 𝜇2 = 0; not allowed
No Solution No solution
Example: KKT Conditions
39
Step 2: FONC 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0
−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0
𝜇1 𝑥1 − 4𝑥2 + 3 = 0
𝜇2 −𝑥1 + 2𝑥2 = 0
Scenario 3: 𝜇1 ≠ 0, 𝜇2 = 0 Scenario 4: 𝜇1 ≠ 0, 𝜇2 ≠ 0
16𝑥1 − 8𝑥2 + 𝜇1 = 0 16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2 = 0
−8𝑥1 + 6𝑥2 − 4𝜇1 = 0 −8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2 = 0
𝑥1 − 4𝑥2 + 3 = 0 𝑥1 − 4𝑥2 + 3 = 0
𝑔2: −𝑥1 + 2𝑥2 ≤ 0 −𝑥1 + 2𝑥2 = 0
𝑥1 =
13
33
, 𝑥2 =
28
33
, 𝜇1 =
16
33
𝑥1 = 3, 𝑥2 =
3
2
, 𝜇1 =
57
2
, 𝜇2 =
129
2
violates constraint 𝑔2
No Solution KKT point!
Example: KKT Conditions
40
Step 3: Check SOSC
KKT point: 𝑥1 = 3, 𝑥2 =
3
2
, 𝜇1 =
57
2
, 𝜇2 =
129
2
ℒ𝒙𝒙 =
16 −8
−8 6
Since ℒ𝒙𝒙 is pos. def. everywhere, the unique KKT point
𝐱∗ =
3
1.5
is a global minimizer at 𝑓∗ = 42.75.
𝛁𝐱ℒ =
16𝑥1 − 8𝑥2 + 𝜇1 − 𝜇2
−8𝑥1 + 6𝑥2 − 4𝜇1 + 2𝜇2
= 0
Example 2: KKT Conditions
Step 1: 𝛁𝐱ℒ = 𝛁𝑓 𝐱
∗ + 𝛌𝑇𝛁𝐡 𝐱∗ + 𝛍𝑇𝛁𝐠 𝐱∗ = 𝟎𝑻
𝛁𝑓 𝐱 =
−1
0
, 𝛁𝐠 𝐱 =
3 1 − 𝑥1
2 1
−1 0
0 −1
, 𝛁𝐡 𝐱 = [ ]
𝛁𝐱ℒ =
−1 + 3 1 − 𝑥1
2𝜇1 − 𝜇2
𝜇1 − 𝜇3
= 0
2 equations, 5 unknowns…
𝜇2 = 3 1 − 𝑥1
2𝜇1 − 1
𝜇1 = 𝜇3
41
Example: KKT Conditions
𝛍𝑇𝐠 𝐱∗ = 0
𝜇1 𝑥2 − 1 − 𝑥1
3 = 0
−𝜇2𝑥1 = 0
−𝜇3𝑥2 = 0
42
𝜇2 = 3 1 − 𝑥1
2𝜇1 − 1
𝜇1 = 𝜇3
Now we have 5 equations, 5 unknowns
Solution to 5 eqns:
𝑥1
∗ = 0
𝑥2
∗ = [0,1]
(anywhere in range)
𝜇1 = 0
𝜇2 = −1
𝜇3 = 0
This
violates
𝛍 ≥ 𝟎 !!
2 eqns from previous slide:
3 more
equations
Example 2: What went wrong?
Important caveat: The approaches in Chapter 5 assume
that any KKT point found is a “regular point”, where the
active constraints are all linearly independent
43
The constraints are not
linearly independent,
causing an inability to
find a valid KKT point
Quasi-Newton and SQP
algorithms
Two common algorithms that use derivatives
44
Quasi-Newton methods
𝐇 and 𝐇−𝟏 are difficult to compute, so we can avoid
them by approximating 𝐇 or 𝐇−𝟏 using only first
derivatives:
Example update function: Davidon-Fletcher-Powell (DFP)
45For the derivation of this, see Ch. 6
Sequential quadratic programming (SQP)
1. Initialize problem
2. Solve the QP sub-problem to determine search
direction 𝐬𝑘
3. Set 𝐱𝑘+1 = 𝐱𝑘 + 𝐬𝑘
4. Continue (2) and (3) until satisfied
46
This is one of the most common, robust algorithms
Constrained problems
We can handle constraints in a number of ways
1. Lagrange multipliers:
min 𝐿 𝐱, 𝛌, 𝛍 = 𝑓 𝐱 + 𝛌𝑇𝐡 𝐱 + 𝛍𝑇𝐠 𝐱
2. Penalty function: add to the objective such that
it is zero on the interior and positive when
violating a constraint
min 𝑇 𝐱 = 𝑓 𝐱 + 𝑗=1
𝑚 max 0, 𝑔𝑗(𝐱)
2
3. Barrier function: add to the objective such that it
gets really high at the constraint boundary
min 𝑇 𝐱 = 𝑓 𝐱 − 𝑗=1
𝑚 ln −𝑔(𝐱)
47
Gradient-based approaches summary
Using derivatives to find a local minimum of a
differentiable function (or surrogate model)
• Unconstrained
• FONC and SOSC (when math is simple enough to solve)
• Gradient descent (algorithm with linear convergence)
• Newton method (algorithm with hyper-linear convergence)
• Constrained
• Reduced gradient (analytical with known active constraints)
• Generalized Reduced Gradient (algorithm w active constr)
• Active set strategy (algorithm w updating set of active constr)
• Lagrangian (equality or active inequality constr)
• KKT conditions (with any inequality and equality constr)
• Quasi-Newton methods (2nd-derivative-free)
• SQP (efficiently handles constraints)
48
Acknowledgements
• Much of this material came from Chapters 5 & 6 of
the textbook, Principles of Optimal Design
• Some of these slides and examples came from Drs.
Emrah Bayrak and Namwoo Kang at the University
of Michigan
49
Announcements
• HW3 is due on Tuesday at noon
• My office hours are Monday 2-4pm
• I will be available for appointments on Thursday
and Friday of this week, so start early! I am happy
to answer specific questions about your work or
code over email. I probably will not respond over
the weekend.
• Project progress reports are due in 2.5 weeks!
There is a template online that you should use.
50
Exercises
51
Exercise (optional): Lagrange
Multipliers
52
Exercise (optional): Gradient descent
53
Apply gradient descent in MATLAB to the Rosenbrock
function:
𝑓 𝑥, 𝑦 = 𝑥 − 1 2 + 100 𝑦 − 𝑥2
2
1. Begin with a feasible point 𝐱0
2. Find the gradient at that point 𝛁𝑓 𝐱0
3. Move in the direction of the negative gradient to
find an improved 𝐱1
𝐱1 = 𝐱0 − 𝛼𝛁𝑓 𝐱0
4. Continue to iterate until you stop improving
𝐱𝑘 = 𝐱𝑘−1 −
𝛁𝑓𝑇 𝐱𝑘−1 𝛁𝑓 𝐱𝑘−1
𝛁𝑓𝑇 𝐱𝑘−1 𝐇(𝐱𝑘−1)𝛁𝑓 𝐱𝑘−1
𝛁𝑓 𝐱𝑘−1