Numerical Optimisation Constraint optimisation: Penalty and augmented Lagrangian methods
Numerical Optimisation
Constraint optimisation:
Penalty and augmented Lagrangian methods
Marta M. Betcke
m.betcke@ucl.ac.uk,
Kiko Rullan
f.rullan@cs.ucl.ac.uk
Department of Computer Science,
Centre for Medical Image Computing,
Centre for Inverse Problems
University College London
Lecture 14
M.M. Betcke Numerical Optimisation
Lagrangian: primal problem
Constraint optimization problem
min
x∈D⊂Rn
f (x) (COP)
subject to fi (x) ≤ 0, i = 1, . . . ,m,
hi (x) = 0, i = 1, . . . , p
Conflicting goals: minimise the function and satisfy the
constraints.
Idea: Minimise a merit function Q(x ;µ) where µ are parameters.
Some minimisers of Q(x ;µ) approach those of f subject to the
constraints as µ approach some set M.
Benefit: reformulation as an unconstraint problem.
M.M. Betcke Numerical Optimisation
Quadratic penalty
Consider a problem with equality constraints
min
x∈D⊂Rn
f (x) (COP:E)
subject to hi (x) = 0, i = 1, . . . , p.
The merit function (quadratic penalty function)
Q(x ;µ) := f (x) +
µ
2
p∑
i=1
h2i (x),
where µ > 0 is the penalty parameter.
Idea: choose a sequence {µk} : µk →∞ as k →∞,
i.e. increasingly penalise the constraint, and compute the sequence
{xk} of (approximate) minimisers of Q(x ;µk).
M.M. Betcke Numerical Optimisation
Convergence for the quadratic penalty
Let {xk} be the sequence of approximate minimisers of Q(x ;µk),
such that ‖∇xQ(xk ;µk)‖ ≤ τk , x? be the limit point of {xk} as
the sequences of the penalty parameters µk →∞ and tolerances,
τk → 0.
If a limit point x? is infeasible, it is a stationary point of
‖h(x)‖2.
If a limit point x? is feasible and the constraint gradients
∇hi (x?) are linearly independent, then x? is a KKT point for
(COP:E), and we have that
lim
k→∞
µkhi (xk) = ν
?
i , i = 1, . . . , p,
where ν? is the multiplier vector that satisfies the KKT
conditions for (COP:E).
M.M. Betcke Numerical Optimisation
Proof:
∇xQ(xk ;µk) = ∇f (xk) +
p∑
i=1
µkhi (xk)∇hi (xk) (1)
From the convergence criterium ‖∇xQ(xk ;µk)‖ ≤ τk (using the
inequality ‖a‖ − ‖b‖ ≤ ‖a + b‖) we obtain∥∥∥∥∥
p∑
i=1
hi (xk)∇hi (xk)
∥∥∥∥∥ ≤ 1µk (τk + ‖∇f (xk)‖) .
As k →∞: τk → 0, ‖∇f (xk)‖ → ‖∇f (x?)‖ and µk →∞ thus
p∑
i=1
hi (x
?)∇hi (x?) = 0.
M.M. Betcke Numerical Optimisation
i) If hi (x
?) 6= 0, i = 1, . . . , p then ∇hi (x?) are linearly dependent
which implies that x? is a stationary point of ‖h(x)‖2.
ii) If ∇hi (x?), i = 1, . . . , p are linearly independent, hi (x?) = 0
and x? is primarily feasible i.e. satisfies the second KKT
condition. It remains to show that the “dual feasibility” (the
first KKT condition) is satisfied.
Case ii):
Intuition:
As k →∞, Q(xk) should approach the Lagrangian
L(x?; ν?) = f (x?) +
p∑
i=1
ν?i hi (x
?). (2)
and ∇xQ(xk) its derivative i.e. the“dual feasibility” condition
∇xL(x?; ν?) = ∇f (x?) +
p∑
i=1
ν?i ∇hi (x
?). (3)
M.M. Betcke Numerical Optimisation
Rearranging (1) and denoting A(x)T := ∇hi (xk), i = 1, . . . , p and
νk := µkh(xk) we obtain
A(xk)
Tνk = −∇f (xk) +∇xQ(xk ;µk), ‖∇xQ(xk ;µk)‖ ≤ τk .
For large enough k the matrix A(xk) has full row rank and hence
the above overdetermined system has the unique solution
νk =
(
A(xk)A(xk)
T
)−1
A(xk)[−∇f (xk) +∇xQ(xk ;µk)].
Taking the limit as k →∞
lim
k→∞
νk = ν? = −
(
A(x?)A(x?)T
)−1
A(x?)∇f (x?)
and the same in (1) yields the “dual feasibility” condition
∇f (x?) + A(x?)Tν? = 0.
Hence, x? is the KKT point with unique Lagrange multiplier ν?.
M.M. Betcke Numerical Optimisation
Example
min x1 + x2
subject to x21 + x
2
2 − 2 = 0.
Solution: (−1,−1)T.
Quadratic penalty function: Q(x ;µ) = x1 + x2 +
µ
2
(x21 + x
2
2 − 2)
2.
Q(x; y;7); 7 = 1
x1
-1.5 -1 -0.5 0 0.5 1 1.5
x
2
-1.5
-1
-0.5
0
0.5
1
1.5
-1
0
1
2
3
4
5
Q(x; y;7); 7 = 10
x1
-1.5 -1 -0.5 0 0.5 1 1.5
x
2
-1.5
-1
-0.5
0
0.5
1
1.5
0
5
10
15
20
25
30
M.M. Betcke Numerical Optimisation
Problem
min − 5×21 + x
2
2
subject to x1 = 1.
Solution: (1, 0)T.
Quadratic penalty function: Q(x ;µ) = −5×21 + x
2
2 +
µ
2
(x1 − 1)2.
Q(x ;µ) is unbounded for µ < 10.
The iterates would diverge. Unfortunately, a common problem.
M.M. Betcke Numerical Optimisation
Ill-conditioning of Hessian
Newton step: ∇2xxQ(x ;µk)pn = −∇xQ(x ;µk)
∇2xxQ(x ;µ) = ∇
2f (x) +
p∑
i=1
µkhi (x)︸ ︷︷ ︸
≈ν?
i
∇2hi (x) + µk ∇h(x)︸ ︷︷ ︸
=:A(x)T
∇h(x)T.
If x is sufficiently close to the minimiser of Q(·;µk)
∇2xxQ(x ;µk) ≈ ∇
2
xxL(x ; ν
?) + µkA(x)
TA(x).
As µk →∞ the Hessian is dominated by the second term and
hence increasingly ill-conditioned.
Alternative formulation avoids ill-conditioning, ζ = µkA(x)pn[
∇2f (x) +
∑p
i=1 µkhi (x)∇
2hi (x) A(x)
T
A(x) µ−1k I
] [
pn
ζ
]
=
[
−∇xQ(x ;µk)
0
]
.
Still, if µkhi (x) is not a good enough approximation to ν
?,
inadequate quadratic model yields inadequae search direction pn.
M.M. Betcke Numerical Optimisation
General constraint problem
For general constraint problems including equality and inequality
constraints, the quadratic penalty function can be defined as
Q(x ;µ) := f (x) +
µ
2
p∑
i=1
h2i (x) +
µ
2
m∑
i=1
(
[fi (x)]
−)2 ,
where [y ]− := max{y , 0}
Note: Q may be less smooth than the objective and constraint
functions e.g. f1(x) = x1 ≥ 0, then max{y , 0}2 has discontinuous
second derivate and so does Q.
M.M. Betcke Numerical Optimisation
Practical penalty methods
µk can be chosen adaptively based on the difficulty of
minimising the penalty function in each iteration i.e. when
minimising Q(x ;µk) is expensive, choose µk+1 moderately
larger than µk e.g. µk+1 = 1.5µk , when minimising Q(x ;µk)
is cheap, choose µk+1 larger e.g. µk+1 = 10µk .
There is no guarantee that ‖∇xQ(x ;µk)‖ ≤ τk will be
satisfied. Practical implementations need safe guards to
increase µ (and possibly restore the initial point) when
constraint violation is not decreasing fast enough or when the
iterates appear diverging.
M.M. Betcke Numerical Optimisation
When only equality constraints are present, Q(x ;µk) is
smooth and algorithms for unconstraint optimisation can be
used, however Q(x ;µk) becomes more difficult to minimise as
µk becomes large unless special techniques are used. In
particular, methods like conjugate gradients and quasi-Newton
will perform poorly, Newton method can be adapted (see the
reformulation of Newton step) but it still can yield inadequate
direction due to inadequacy of quadratic approximation.
Choice of initial point e.g. warm start x sk+1 = xk can improve
performance of Newton.
M.M. Betcke Numerical Optimisation
Nonsmooth penalty functions
Some penalty functions are exact i.e. for certain choices of penalty
parameters, minimisation w.r.t. x yields the exact minimiser of f .
To be exact the function has to be nonsmooth.
Q1(x ;µ) := f (x) + µ
p∑
i=1
|hi (x)|+ µ
m∑
i=1
[fi (x)]
−,
where [y ]− := max{y , 0}.
Let x? be a strict local minimiser of (COP), which satisfies the 1st
order necessary conditions with Lagrange multipliers ν?, λ?. Then
x? is a local minimiser of Q1(x ;µ) for all µ > µ
? = ‖(ν?, λ?)T‖∞.
If moreover, the 2nd order sufficient conditions hold at µ > µ?,
then x? is a strict local minimiser or Q1(x ;µ).
Let x̂ be a stationary point of the penalty function Q1(x ;µ) for all
µ > µ̂ > 0. Then, if x̂ is feasible for (COP), it satisfies KKT
conditions. If x̂ is not feasible for (COP), it is an infeasible
stationary point.
M.M. Betcke Numerical Optimisation
Example revisited
min x1 + x2
subject to x21 + x
2
2 − 2 = 0.
Solution: (−1,−1)T.
`1 penalty function: Q1(x ;µ) = x1 + x2 + µ|x21 + x
2
2 − 2|.
Q(x; y;7); 7 = 2
x1
-1.5 -1 -0.5 0 0.5 1 1.5
x
2
-1.5
-1
-0.5
0
0.5
1
1.5
-1
0
1
2
3
4
5
Q(x; y;7); 7 = 5
x1
-1.5 -1 -0.5 0 0.5 1 1.5
x
2
-1.5
-1
-0.5
0
0.5
1
1.5
-1
0
1
2
3
4
5
6
7
8
M.M. Betcke Numerical Optimisation
Augmented Lagrangian
Reduces ill-conditioning by introducing explicit Lagrange multiplier
estimates into the function to be minimised.
Can preserve smoothness. Can be implemented using standard
unconstrained (or bound constrained) optimization.
Motivation: The minimisers xk of Q(x ;µk) do not quite satisfy
the feasibility condition hi (x) = 0
hi (xk) ≈ ν?/µk , i = 1, . . . , p.
Obviously, in the limit µk →∞, hi (x)→ 0 but can we avoid this
systematic perturbation for moderate values of µk?
Augmented Lagrangian:
LA(x , ν;µ) := f (x) +
p∑
i=1
νhi (x) +
µ
2
p∑
i=1
h2i (x).
M.M. Betcke Numerical Optimisation
Update of Lagrange multiplier estimate
Optimality condition for the unconstraint minimiser of
LA(x , νk ;µk)
0 ≈ ∇xLA(xk , νk ;µk) = ∇f (xk) +
p∑
i=1
[νki + µkhi (xk)]∇hi (xk).
Optimality condition for the Lagrangian of (COP:E)
0 ≈ ∇xL(xk , ν?) = ∇f (xk) +
p∑
i=1
ν?i ∇hi (xk).
Comparison yields (an update scheme for ν):
ν? ≈ νk + µkhi (xk), i = 1, . . . , p
as from hi (xk) =
1
µk
(ν?i − ν
k
i ), i = 1, . . . p we see that if ν
k is
close to ν? the infeasibility goes to 0 faster than 1/µk .
M.M. Betcke Numerical Optimisation
Example revisited
min x1 + x2
subject to x21 + x
2
2 − 2 = 0.
Solution: (−1,−1)T.
Augmented Lagrangian:
L(x , ν;µ) = x1 + x2 + ν(x21 + x
2
2 − 2) +
µ
2
(x21 + x
2
2 − 2)
2.
Q(x; y;7); 7 = 1
x1
-1.5 -1 -0.5 0 0.5 1 1.5
x
2
-1.5
-1
-0.5
0
0.5
1
1.5
-1
0
1
2
3
4
5
6
Q(x; y;7); 7 = 10
x1
-1.5 -1 -0.5 0 0.5 1 1.5
x
2
-1.5
-1
-0.5
0
0.5
1
1.5
0
5
10
15
20
25
30
Figure: ν = 0.4
M.M. Betcke Numerical Optimisation
Convergence
Let x? be a local minimiser of (COP:E) at which the constraint
gradients are linearly independent and which satisfies the 2nd order
sufficient conditions with Lagrange multipliers ν?. Then for all
µ ≥ µ̄ > 0, x? is a strict local minimiser of LA(x , ν?;µ).
Furthermore, there exist δ, �,M > 0 such that for all νk , µk
satisfying
‖νk − ν?‖ ≤ µkδ, µk ≥ µ̄,
the problem minLA(x , νk ;µk), subject to ‖x − x?‖ ≤ �, has a
unique solution xk and it holds
‖xk − x?‖ ≤ M‖νk − ν?‖/µk
it holds
‖νk+1 − ν?‖ ≤ M‖νk − ν?‖/µk ,
where νk+1 = νk + µkh(xk).
the matrix ∇2xxLA(xk , νk ;µk) is positive definite and the
constraint gradients ∇hi (xk), i = 1, . . . , p are linearly
independent.
M.M. Betcke Numerical Optimisation
Practical Augmented Lagrangian methods
Bound constraint formulation: convert inequality
constraints into equality constraints using slack variables
fi (x)− si = 0, si ≥ 0, i ∈ {1, . . . .m}.
Bound constraints are not transformed. Solve by projected
gradient algorithm
xk+1 = P(xk −∇x LA(x , ν;µ)|xk ; l , u) = 0,
where P(·; l , u) projects on the box [l , u].
M.M. Betcke Numerical Optimisation
Linearly constraint formulation: transform into equality
constraint problem and linearise constraints
minFk(x), subject to fi (xk)+∇f Ti (xk)(x−xk) = 0, l ≤ x ≤ u.
Choose Fk as
Fk(x) = f (x) +
m∑
i=1
νki f̄
k
i (x),
where
f̄ ki (x) = fi (x)− fi (xk)−∇fi (xk)
T(x − xk).
Preferred choice (larger convergence radius in practise)
Fk(x) = f (x) +
m∑
i=1
νki f̄
k
i (x) +
µ
2
m∑
i=1
(f̄ ki (x))
2
Unconstraint formulation: obtain unconstraint formulation
using smooth approximation to feasibility set indicator
function.
M.M. Betcke Numerical Optimisation