Design optimization algorithms and tools
Gradient-based methods,
continued
ME 564/SYS 564
Wed Oct 3, 2018
Steven Hoffenson
Goal of Week 6: To learn Newton’s method, practice MATLAB coding, and begin
learning some constrained approaches.
Recap: Week 5
• The optimality conditions can be used to prove an
interior optimum
• The First-Order Necessary Condition identifies
stationary points
• The Second-Order Sufficiency Condition identifies the
nature (minima, maxima, saddle) of stationary points
• Taylor series approximation is used to generate
derivative-based local optimization directions
• The gradient descent algorithm uses 1st-order info
• Newton’s method (algorithm) uses 2nd-order info, which
we didn’t get to last week…
2
Recap: How to optimize
1. Formulate the problem
a) Define system boundaries
b) Develop analytical models
c) Explore/reduce the problem space
d) Formalize optimization problem
2. Solve the problem
a) Choose the right approach/algorithm
b) Solve (by hand, code, or software)
c) Interpret the results
d) Iterate if needed
3
𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱0
(Weeks 1-3, 9-12)
(Weeks 4-8, 13)
Recap
1st-order algorithm: Gradient descent
Local optimization algorithm for interior optima
1. Begin with a feasible point 𝐱0
2. Find the gradient at that point 𝛁𝑓 𝐱0
3. Move in the direction of the negative gradient to
find an improved 𝐱1
𝐱1 = 𝐱0 − 𝛁𝑓 𝐱0
4. Continue to iterate until you stop improving
𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1
4
For greater efficiency, add a step size: α =
𝛁𝑓𝑇 𝐱𝑘−1 𝛁𝑓 𝐱𝑘−1
𝛁𝑓𝑇 𝐱𝑘−1 𝐇(𝐱𝑘−1)𝛁𝑓 𝐱𝑘−1
2nd-order algorithm: Newton’s method
Starting at a point 𝐱0, we want to find a direction
that will lower the objective value
Using 1st- and 2nd-order terms,
We again want 𝜕𝑓 < 0. We can use FONC to minimize 𝜕𝑓 with respect to 𝜕𝐱: 5 𝜕𝑓 ≈ 𝛻𝑓 𝐱0 𝜕𝐱 + 1 2 𝜕𝐱T𝐇(𝐱0)𝜕𝐱 𝜕𝑓 𝜕𝐱 = 𝛻𝑓 𝐱0 + 𝜕𝐱 𝑇𝐇 𝐱0 = 𝟎 𝑇 𝜕𝐱 = −𝐇−1 𝐱0 𝛻𝑓 𝐱0 𝑇 Newton’s Method 𝜕𝐱𝑘+1 = −𝐇 −1 𝐱𝑘 𝛻𝑓 𝐱𝑘 𝑇 = −𝐇−1 𝐱𝑘 𝐠𝑘 𝐱𝑘+1 = 𝐱𝑘 −𝐇 −1(𝐱𝑘)𝐠𝑘 𝑓(𝑥) 6 Newton’s method Local optimization algorithm for convex functions 1. Begin with a feasible point 𝐱0 2. Find the gradient and Hessian that point 3. Move in the following way: 𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘) −1𝛁𝑓 𝐱𝑘 Similar to gradient descent, but multiply the Hessian inverse by gradient. We can also add a scale factor α if we want to, but we don’t usually need that. Note: This is very effective for quadratic objectives. 7 Newton’s method example 8 Problem: Gradient & Hessian: Algorithm: Stopping Criterion The iterations continue until? 𝛻𝑓 𝐱𝑘 = 𝟎 𝑇 Instead of checking all elements of 𝛻𝑓 𝐱𝑘 check: 𝛻𝑓(𝐱𝑘) = 0 Is it numerically possible to obtain exactly 0? Probably not, so use: 𝛻𝑓(𝐱𝑘) ≤ 𝜀 9 Generic algorithm 1) Start with 𝐱0 2) Calculate 𝐬𝑘 = −𝐠𝑘 or 𝐬𝑘 = −𝐇𝑘 −1𝐠𝑘 3) Update 𝐱𝑘+1 = 𝐱𝑘 + 𝐬𝑘 4) Check if 𝐠𝑘 < 𝜀 5) If yes, stop. If not go to (2). 10 Another example Consider the following function: 𝑓 𝐱 = 𝑥1 4 − 2𝑥1 2𝑥2 + 𝑥2 2 Apply gradient descent method: 𝐠𝑘 = 2(𝑥1 2 − 𝑥2) 2𝑥1 −1 𝐱𝑘+1 = 𝐱𝑘 − 𝐠𝑘 Start from 𝐱0 = 1.1, 1 𝑇 where 𝑓0 = 44.1(10 −3) 𝐱1 = 1.1 1 − 2 1.21 − 1 2.2 −1 = 0.176 1.42 𝑓1 = 1.929 𝐱2 = 0.176 1.42 − 2 0.031 − 1.42 0.352 −1 = 1.154 −1.358 𝑓2 = 7.235 Uh oh… it’s not supposed to increase! Sometimes gradient descent overshoots the target! 11 Convexity A set is convex if a line segment connecting any two points in the set contains only points within the set More rigorously, a set 𝑆 is convex if, for every point 𝐱1, 𝐱2 ∈ 𝑆, the point 𝐱 𝜆 = 𝜆𝐱2 + 1 − 𝜆 𝐱1, 0 ≤ 𝜆 ≤ 1 also belongs to 𝑆. 12 Convex Non-convex If the Hessian of the objective function is positive definite everywhere, then the problem is convex! This can help you conclude that you have found a global solution. Why care about convexity? If you can prove convexity, then any local optimum is a global optimum! 13 Convex Non-convex Not convex Local minima exist Convex Unique global minimum Exercise Solving using FONC, SOSC, gradient descent, and Newton’s method 14 Recall: FONC and SOSC 1. First-order necessary condition If 𝑓(𝐱) is differentiable and 𝐱∗ is a local minimum, then 𝛁𝑓(𝐱∗) = 𝟎. 2. Second-order sufficient condition If 𝛁𝑓(𝐱∗) = 𝟎 and 𝐇 𝐱 is positive-definite, then 𝐱∗ is a local minimum 15 Example 4.19 16 min 𝑓 𝐱 = 1 3 𝑥1 3 + 𝑥1𝑥2 + 1 2 𝑥2 2 + 2𝑥2 − 2 3 𝛻𝑓 𝐱 = 𝑥1 2 + 𝑥2 𝑥1 + 𝑥2 + 2 𝐇 𝐱 = 2𝑥1 1 1 1 Use FONC and SOSC Setting 𝛻𝑓 𝐱 = 𝟎 2nd term: 𝑥2 = −𝑥1 − 2 Sub into 1st: 𝑥1 2 − 𝑥1 − 2 = 0 Solve: 𝑥1 = 2,−1 Plug in for 𝑥2 in both scenarios: 𝐱∗ = 2,−4 , −1,−1 Now test these points in the Hessian: 𝐇 2,−4 = 4 1 1 1 pos. def. 𝐇 −1,−1 = −2 1 1 1 not pos. def.! Therefore, (2,-4) is the only local minimum. Gradient descent algorithm Local optimization algorithm for interior optima 1. Begin with a feasible point 𝐱0 2. Find the gradient at that point 𝛁𝑓 𝐱0 3. Move in the direction of the negative gradient to find an improved 𝐱1 𝐱1 = 𝐱0 − 𝛁𝑓 𝐱0 4. Continue to iterate until you stop improving 𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1 17 Example 4.19 18 min 𝑓 𝐱 = 1 3 𝑥1 3 + 𝑥1𝑥2 + 1 2 𝑥2 2 + 2𝑥2 − 2 3 Use gradient descent 𝛻𝑓 𝐱 = 𝑥1 2 + 𝑥2 𝑥1 + 𝑥2 + 2 𝐇 𝐱 = 2𝑥1 1 1 1 Starting point: 𝐱 = 1 1 Recall gradient descent algorithm: 𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1 Write a code that does iterations of this. Example 4.19 19 min 𝑓 𝐱 = 1 3 𝑥1 3 + 𝑥1𝑥2 + 1 2 𝑥2 2 + 2𝑥2 − 2 3 Use gradient descent 𝛻𝑓 𝐱 = 𝑥1 2 + 𝑥2 𝑥1 + 𝑥2 + 2 𝐇 𝐱 = 2𝑥1 1 1 1 Starting point: 𝐱 = 1 1 Recall gradient descent algorithm: 𝐱𝑘 = 𝐱𝑘−1 − 𝛁𝑓 𝐱𝑘−1 Write a code that does iterations of this. If you have time, add the step size! 𝛼 = 𝛁𝑓𝑇 𝐱𝑘−1 𝛁𝑓 𝐱𝑘−1 𝛁𝑓𝑇 𝐱𝑘−1 𝐇(𝐱𝑘−1)𝛁𝑓 𝐱𝑘−1 Newton’s method Local optimization algorithm for convex functions 1. Begin with a feasible point 𝐱0 2. Find the gradient and Hessian that point 3. Move in the following way: 𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘) −1𝛁𝑓 𝐱𝑘 Similar to gradient descent, but multiply the Hessian inverse by gradient. We can also add a scale factor α. Note: This is very effective for quadratic objectives. 20 Example 4.19 21 min 𝑓 𝐱 = 1 3 𝑥1 3 + 𝑥1𝑥2 + 1 2 𝑥2 2 + 2𝑥2 − 2 3 Use Newton’s method 𝛻𝑓 𝐱 = 𝑥1 2 + 𝑥2 𝑥1 + 𝑥2 + 2 𝐇 𝐱 = 2𝑥1 1 1 1 Starting point: 𝐱 = 1 1 Recall Newton’s method: 𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘) −1𝛁𝑓 𝐱𝑘 Write a code that does iterations of this. Using MATLAB’s built-in optimizer • The main derivative-based optimization function in MATLAB is fmincon • Think “function” “minimize” “constrained” • It takes in the objective function “fun”, a starting point “x0”, linear constraint matrices “A”, “B”, “Aeq”, “Beq”, lower and upper bounds on the variables “LB” and “UB”, and nonlinear constraints “NONLCON”, and it produces x* and f* [X,FVAL]=fmincon(FUN,X0,A,B,Aeq,Beq,LB,UB,NONLCON) 22 This is a good video tutorial on fmincon: https://www.youtube.com/watch?v=DOIawp-q3mQ https://www.youtube.com/watch?v=DOIawp-q3mQ Using fmincon [X,FVAL]=fmincon(FUN,X0,A,B,Aeq,Beq,LB,UB,NONLCON) Cantilever beam example: min Volume w.r.t. length, width, height s.t. stress, deflection, geometry constraints 23 Sample code will be posted online An unusual example of FONC and SOSC 24 Advanced FONC and SOSC example with a stationary point • Find the stationary point of the following problem • Show that it is a saddle. • Show the directions to decrease the function value. min 𝑓 𝑥 = 2𝑥1 2 − 4𝑥1𝑥2 + 1.5𝑥2 2 + 𝑥2 𝛻𝑓 𝐱 = 4𝑥1 − 4𝑥2 −4𝑥1 + 3𝑥2 + 1 = 0 0 𝐱∗ = 1 1 𝐇 𝐱∗ = 4 −4 −4 3 det 𝐇 𝐱∗ = −4 < 0 x* is a saddle point Example with a stationary point • Show the directions to decrease the function value. min 𝑓 𝑥 = 2𝑥1 2 − 4𝑥1𝑥2 + 1.5𝑥2 2 + 𝑥2 𝐇 𝐱∗ = 4 −4 −4 3 Let’s check 𝜕𝐱𝐓𝐇(𝐱∗)𝜕𝐱 where Recall 𝜕𝑓 = 𝛻𝑓 𝐱∗ 𝜕𝐱 + 𝜕𝐱 𝐓𝐇 𝐱∗ 𝜕𝐱 𝜕𝐱 = 𝜕𝑥1 𝜕𝑥2 𝜕𝐱𝐓𝐇 𝐱∗ 𝜕𝐱 = 4𝜕𝑥1 2 − 8𝜕𝑥1𝜕𝑥2 + 3𝜕𝑥2 2 = (2𝜕𝑥1 − 3𝜕𝑥2)(2𝜕𝑥1 − 𝜕𝑥2) Since 𝜕𝑥1 = 𝑥1 − 1 and 𝜕𝑥2 = 𝑥2 − 1 𝜕𝐱𝐓𝐇 𝐱∗ 𝜕𝐱 = (2𝑥1 − 3𝑥2 + 1)(2𝑥1 − 𝑥2 − 1) One of these has to be positive while the other is negative! Example with a stationary point x1 x2 1 1 2𝑥1 − 𝑥2 − 1 = 0 2𝑥1 − 3𝑥2 + 1 = 0 Regions for 𝜕𝑓 < 0 𝜕𝐱𝐓𝐇 𝐱∗ 𝜕𝐱 = (2𝑥1 − 3𝑥2 + 1)(2𝑥1 − 𝑥2 − 1)