CS计算机代考程序代写 Lagrange Multipliers

Lagrange Multipliers
May 16, 2020
Abstract
We consider a special case of Lagrange Multipliers for constrained opti- mization. The class quickly sketched the “geometric” intuition for La- grange multipliers, and this note considers a short algebraic derivation.
In order to minimize or maximize a function with linear constraints, we consider finding the critical points (which may be local maxima, local minima, or saddle points) of
f(x) subject to Ax = b
Heref:Rd →Risaconvex(orconcave)function,x∈Rd,A∈Rn×d,and b ∈ Rn. To find the critical points, we cannot just set the derivative of the objective equal to 0.1 The technique we consider is to turn the problem from a constrained problem into an unconstrained problem using the Lagrangian,
L(x, μ) = f (x) + μT (Ax − b) in which μ ∈ Rn
We’ll show that the critical points of the constrained function f are critical
points of L(x, μ).
Finding the Space of Solutions Assume the constraints are satisfiable, then let x0 be such that Ax0 = b. Let rank(A) = r, then let {u1,…,uk} be an orthonormal basis for the null space of A in which k = d−r. Note if k = 0, then x0 is uniquely defined. So we consider k > 0. We write this basis as a matrix:
U = [u1,…,uk] ∈ Rd×k
Since U is a basis, any solution for f(x) can be written as x = x0 + Uy. This
captures all the free parameters of the solution. Thus, we consider the function:
g(y) = f(x0 + Uy) in which g : Rk → R
The critical points of g are critical points of f. Notice that g is unconstrained, so we can use standard calculus to find its critical points.
∇yg(y) = 0 equivalently UT ∇f(x0 + Uy) = 0. 1See the example at the end of this document.
1

To make sure the types are clear: ∇yg(y) ∈ Rk, ∇f(z) ∈ Rd and U ∈ Rd×k. In both cases, 0 is the 0 vector in Rk.
The above condition says that if y is a critical point for g, then ∇f(x) must be orthogonal to U. However, U forms a basis for the null space of A and the rowspace is orthogonal to it. In particular, any element of the rowspace can be written z = AT μ ∈ Rd. We verify that z and u = Uy are orthogonal since:
zT u = μT Au = μT 0 = 0
Since we can decompose Rd as a direct sum of null(A) and the rowspace of A, we know that any vector orthogonal to U must be in the rowspace. We can rewrite this orthogonality condition as follows: there is some μ ∈ Rn (depending on x) such that
∇f(x) + AT μ = 0 foracertainxsuchthatAx=A(x0+Uy)=Ax0 =b.
The Clever Lagrangian We now observe that the critical points of the La- grangian are (by differentiating and setting to 0)
∇x L(x, μ) = ∇f (x) + AT μ = 0 and ∇μ L(x, μ) = Ax − b = 0
The first condition is exactly the condition that x be a critical point in the way we derived it above, and the second condition says that the constraint be satisfied. Thus, if x is a critical point, there exists some μ as above, and (x, μ) is a critical point for L.
Generalizing to Nonlinear Equality Constraints Lagrange multipliers are a much more general technique. If you want to handle non-linear equality constraints, then you will need a little extra machinery: the implicit function theorem. However, the key idea is that you find the space of solutions and you optimize. In that case, finding the critical points of
f(x)s.t. g(x)=cleadstoL(x,μ)=f(x)+μT(g(x)−c).
The gradient condition here is ∇f(x)+JT μ = 0, where J is the Jacobian matrix of g. For the case where we have a single constraint, the gradient condition reduces to ∇f(x) = −μ1∇g1(x), which we can view as saying, “at a critical point, the gradient of the surface must be parallel to the gradient of the function.” This connects us back to the picture that we drew during lecture.
Example: Need for constrained optimization We give a simple example to show that you cannot just set the derivatives to 0. Consider f(x1,x2) = x1 andg(x1,x2)=x21+x2 andso:
maxf(x) subject to g(x) = 1. x
2

This is just a linear functional over the circle, and it is compact, so the func- tion must achieve a maximum value. Intuitively, we can see that (1, 0) is the maximum possible value (and hence a critical point). Here, we have:
􏰁 1 􏰂 􏰁 x1 􏰂 ∇f(x) = 0 and ∇g(x) = 2 x2
Notice that ∇f(x) is not zero anywhere on the circle–it’s constant! For x ∈ {(1, 0), (−1, 0)}, ∇f (x) = λ∇g(x) (take λ ∈ {1/2, −1/2}, respectively). On the other hand, for any other point on the circle x2 ̸= 0, and so the gradient of f and g are not parallel. Thus, such points are not critical points.
Extra Resources If you find resources you like, post them on Piazza!
3