Numerical Optimisation: Solution with equality constraints
Numerical Optimisation:
Solution with equality constraints
Marta M. Betcke
m.betcke@ucl.ac.uk,
Kiko Rullan
f.rullan@cs.ucl.ac.uk
Department of Computer Science,
Centre for Medical Image Computing,
Centre for Inverse Problems
University College London
Lecture 13
M.M. Betcke Numerical Optimisation
Equality constraint optimisation
min f (x)
subject to Ax = b
where f : D → R is convex and twice continuously differentiable,
A ∈ Rp×n with rankA = p < n.
x? ∈ D is optimal iff ∃ν? such that
Ax? = b, ∇f (x?) + ATν? = 0.
Solving the equality constraint optimisation problem is equivalent
to solving the KKT equations:
Ax? = b primal feasibility equations (linear)
∇f (x?) + ATν? = 0 dual feasibility equations (in general
nonlinear)
M.M. Betcke Numerical Optimisation
Quadratic problem with equality constraints
max
1
2
xTPx + qTx + r
subject to Ax = b
where P is positive semidefinite, A ∈ Rp×n.
x? ∈ Rn is optimal iff ∃ν? : Ax? = b, Px? + q + ATν? = 0.
KKT system:
[
P AT
A 0
] [
x?
ν?
]
=
[
−q
b
]
if KKT matrix is non-singular → unique solution
if KKT matrix is singular, either infinitely many solutions
(each yields an optimal pair) or not solvable (unbounded or
infeasible)
Conditions for nonsingularity of KKT matrix:
rankA = p < n
Null(P) ∩ Null(A) = {0}
Ax = 0, x 6= 0 ⇒ xTPx > 0
M.M. Betcke Numerical Optimisation
Eliminating equality constraints
Since A ∈ Rp×n, it has a null space of dimension n − p. Find a
basis for this null space, N (e.g. swapping columns) and rewrite
x = Nz + x̂ , where z ∈ Rn−p and any particular solution
x̂ : Ax̂ = b.
Solve the resulting unconstraint problem minz∈Rn−p f (Nz + x̂).
From solution z? recover x? = Nz? + x̂ .
Construct optimal dual (ν ∈ Rp, p < n) ν? = −(AAT)−1A∇f (x?). Solve via dual (ν ∈ Rp, p < n). Strong duality implies ∃ν? : g(ν?) = max g(ν) = p?. g(ν) = −bTν + inf x (f (x) + νTAx) = −bTν − sup x (−f (x)− νTAx) = −bTν − f ?(−ATν) M.M. Betcke Numerical Optimisation Feasible Newton method Newton method which starts at a feasible point and subsequently enforces the equality constraints on the step maintaining feasibility. Interpretation: Replace f with its second order Taylor expansion f (x + v) ≈ f (x) +∇f (x)Tv + 1 2 vT∇2f (x)v Linearise optimality conditions A(x + ∆xn) = b, ∇f (x + ∆xn) + ATw ≈ ∇f (x) +∇2f (x)∆xn + ATw = 0 using Ax = b these become A∆xn = 0, ∇2f (x)∆xn + ATw = −∇f (x) Quadratic constraint problem (solution defined if KKT matrix is non-singular) [ ∇2f (x) AT A 0 ] [ ∆xn w ] = [ −∇f (x) 0 ] M.M. Betcke Numerical Optimisation Newton decrement Newton decrement λ(x) = ( ∆xTn ∇ 2f (x)∆xn )1/2 . The difference between f (x) and the minimum of the second order model at x satisfies f (x)− inf A(x+v)=b { f (x) +∇f (x)Tv + 1 2 vT∇2f (x)v } = λ(x)2/2 i.e. λ(x)2/2 is an estimate for f (x)− p? (based on quadratic model) and hence a good stopping criterion. Furthermore it holds, d dt f (x + t∆xn) ∣∣∣∣ t=0 = ∇f (x)T∆xn = −∆xTn ∇ 2f (x)∆xn = −λ(x)2. One of the consequences is that ∆xn is a descent direction. M.M. Betcke Numerical Optimisation Convergence It can be shown that Newton with equality constraints is equivalent to applying Newton to reduced problem obtained by eliminating the equality constraints. Hence the convergence theory for unconstrained problems applies. The assumption on the eigenvalues of the Hessian being bounded away from 0, needs to be replaced by the requirement that the absolute values of the eigenvalues of the indefinite KKT matrix are bounded away from 0. M.M. Betcke Numerical Optimisation Infeasible Newton Starts at any point x ∈ D (not necessarily feasible). Compute step approximately satisfying the optimality conditions x + ∆x ≈ x?. Of interest if D 6= Rn. If D = Rn then the feasible point can be simply computed solving Ax = b, otherwise it may be easier to start with infeasible method. For inequality constraints (after reformulating into equality constraint problems through e.g. implicit constraints): it is an alternative to phase I methods, but in contrast to phase I methods it will not detect that no strictly feasible point exists. Substituting into optimality conditions we obtain[ ∇2f (x) AT A 0 ] [ ∆xn w ] = [ −∇f (x) b − Ax ] Ax − b is the residual, which reduces to 0 when x is feasible. M.M. Betcke Numerical Optimisation Interpretation as a primal-dual method Define the residual r(y) = r(x , ν︸︷︷︸ =y ) = (rd(x , ν), rp(x , ν)) = (∇f (x) + ATν︸ ︷︷ ︸ =rd ,Ax − b︸ ︷︷ ︸ =rp ) First order Taylor approximation of r r(y + z) ≈ r(y) + Dr(y)z , where Dr(y) ∈ Rn+p×n+p is the derivative of r . Let the primal-dual Newton step ∆ypd be the step z for which the Taylor approximation vanishes (i.e. accurate for the linear model) Dr(y)∆ypd = −r(y). M.M. Betcke Numerical Optimisation Written out this reads[ ∇2f (x) AT A 0 ] [ ∆xpd ∆νpd ] = − [ rd rp ] = − [ ∇f (x) + ATν Ax − b ] and substituting ν+ = ν + ∆νpd we obtain [ ∇2f (x) AT A 0 ] [ ∆xpd ν+ ] = − [ ∇f (x) Ax − b ] which is the “infeasible Newton system” with ∆xn = ∆xpd , w = ν + = ν + ∆νpd . M.M. Betcke Numerical Optimisation The Newton direction at an infeasible point is not necessarily a descent direction d dt f (x + t∆x) ∣∣∣∣ t=0 = ∇f (x)T∆x = −∆xT(∇2f (x)∆x + ATw) = −∆xT∇2f (x)∆x + (Ax − b)Tw . The last equation is not necessarily negative (unless x is feasible and Ax = b). From the primal-dual interpretation we have that the residual norm decreases in Newton direction d dt ‖r(y + t∆ypd)‖2 ∣∣∣∣ t=0 = 2r(y)Dr(y)∆ypd = −2r(y)Tr(y) = −2‖r(y)‖2. This is equivalent to taking the derivative of (·)2 and multiplying with the interior derivative, hence the latter is d dt ‖r(y + t∆ypd)‖ ∣∣∣∣ t=0 = −‖r(y)‖. ‖r‖ can be used to measure progress of the infeasible Newton method e.g. in line search (instead of f in standard Newton). M.M. Betcke Numerical Optimisation By construction the Newton step has the property A(x + ∆xn) = b. Thus once a step of length 1 has been taken in the Newton direction, x + ∆xn and the following iterates will be feasible. Effect of the damped step on the residual rp. For the next iterate x+ = x + t∆xn, t ∈ [0, 1], the primary residual r+p = A(x + ∆xnt)− b = (1− t)(Ax − b) = (1− t)rp is reduced by a factor (1− t). After k iterations we have r (k) = ∏k−1 i=0 (1− t (i))r (0), t(i) ∈ [0, 1], i = 0, . . . , k − 1. Thus the primal residual is in the direction r (0) and scaled down at each step. After a full step has been taken, t = 1, all future iterates are primal feasible. Convergence very similar as for feasible Newton (in a finite number of steps the residual is reduced enough and feasibility is achieved, full steps are taken and the convergence becomes quadratic). M.M. Betcke Numerical Optimisation