程序代写代做代考 scheme algorithm Numerical Optimisation Constraint optimisation: Penalty and augmented Lagrangian methods

Numerical Optimisation Constraint optimisation: Penalty and augmented Lagrangian methods

Numerical Optimisation
Constraint optimisation:

Penalty and augmented Lagrangian methods

Marta M. Betcke
m.betcke@ucl.ac.uk,

Kiko Rullan
f.rullan@cs.ucl.ac.uk

Department of Computer Science,
Centre for Medical Image Computing,

Centre for Inverse Problems
University College London

Lecture 14

M.M. Betcke Numerical Optimisation

Lagrangian: primal problem

Constraint optimization problem

min
x∈D⊂Rn

f (x) (COP)

subject to fi (x) ≤ 0, i = 1, . . . ,m,
hi (x) = 0, i = 1, . . . , p

Conflicting goals: minimise the function and satisfy the
constraints.

Idea: Minimise a merit function Q(x ;µ) where µ are parameters.
Some minimisers of Q(x ;µ) approach those of f subject to the
constraints as µ approach some set M.

Benefit: reformulation as an unconstraint problem.

M.M. Betcke Numerical Optimisation

Quadratic penalty

Consider a problem with equality constraints

min
x∈D⊂Rn

f (x) (COP:E)

subject to hi (x) = 0, i = 1, . . . , p.

The merit function (quadratic penalty function)

Q(x ;µ) := f (x) +
µ

p∑
i=1

h2i (x),

where µ > 0 is the penalty parameter.

Idea: choose a sequence {µk} : µk →∞ as k →∞,
i.e. increasingly penalise the constraint, and compute the sequence
{xk} of (approximate) minimisers of Q(x ;µk).

M.M. Betcke Numerical Optimisation

Convergence for the quadratic penalty

Let {xk} be the sequence of approximate minimisers of Q(x ;µk),
such that ‖∇xQ(xk ;µk)‖ ≤ τk , x? be the limit point of {xk} as
the sequences of the penalty parameters µk →∞ and tolerances,
τk → 0.

If a limit point x? is infeasible, it is a stationary point of
‖h(x)‖2.
If a limit point x? is feasible and the constraint gradients
∇hi (x?) are linearly independent, then x? is a KKT point for
(COP:E), and we have that

lim
k→∞

µkhi (xk) = ν
?
i , i = 1, . . . , p,

where ν? is the multiplier vector that satisfies the KKT
conditions for (COP:E).

M.M. Betcke Numerical Optimisation

Proof:

∇xQ(xk ;µk) = ∇f (xk) +
p∑

i=1

µkhi (xk)∇hi (xk) (1)

From the convergence criterium ‖∇xQ(xk ;µk)‖ ≤ τk (using the
inequality ‖a‖ − ‖b‖ ≤ ‖a + b‖) we obtain∥∥∥∥∥

p∑
i=1

hi (xk)∇hi (xk)

∥∥∥∥∥ ≤ 1µk (τk + ‖∇f (xk)‖) .
As k →∞: τk → 0, ‖∇f (xk)‖ → ‖∇f (x?)‖ and µk →∞ thus

p∑
i=1

hi (x
?)∇hi (x?) = 0.

M.M. Betcke Numerical Optimisation

i) If hi (x
?) 6= 0, i = 1, . . . , p then ∇hi (x?) are linearly dependent

which implies that x? is a stationary point of ‖h(x)‖2.
ii) If ∇hi (x?), i = 1, . . . , p are linearly independent, hi (x?) = 0

and x? is primarily feasible i.e. satisfies the second KKT
condition. It remains to show that the “dual feasibility” (the
first KKT condition) is satisfied.

Case ii):
Intuition:
As k →∞, Q(xk) should approach the Lagrangian

L(x?; ν?) = f (x?) +
p∑

i=1

ν?i hi (x
?). (2)

and ∇xQ(xk) its derivative i.e. the“dual feasibility” condition

∇xL(x?; ν?) = ∇f (x?) +
p∑

i=1

ν?i ∇hi (x
?). (3)

M.M. Betcke Numerical Optimisation

Rearranging (1) and denoting A(x)T := ∇hi (xk), i = 1, . . . , p and
νk := µkh(xk) we obtain

A(xk)
Tνk = −∇f (xk) +∇xQ(xk ;µk), ‖∇xQ(xk ;µk)‖ ≤ τk .

For large enough k the matrix A(xk) has full row rank and hence
the above overdetermined system has the unique solution

νk =
(
A(xk)A(xk)

T
)−1

A(xk)[−∇f (xk) +∇xQ(xk ;µk)].

Taking the limit as k →∞

lim
k→∞

νk = ν? = −
(
A(x?)A(x?)T

)−1
A(x?)∇f (x?)

and the same in (1) yields the “dual feasibility” condition

∇f (x?) + A(x?)Tν? = 0.

Hence, x? is the KKT point with unique Lagrange multiplier ν?.

M.M. Betcke Numerical Optimisation

Example

min x1 + x2

subject to x21 + x
2
2 − 2 = 0.

Solution: (−1,−1)T.

Quadratic penalty function: Q(x ;µ) = x1 + x2 +
µ
2

(x21 + x
2
2 − 2)

Q(x; y;7); 7 = 1

-1.5 -1 -0.5 0 0.5 1 1.5

x
2

-1.5

-1

-0.5

0.5

1.5

-1

Q(x; y;7); 7 = 10

-1.5 -1 -0.5 0 0.5 1 1.5

x
2

-1.5

-1

-0.5

0.5

1.5

M.M. Betcke Numerical Optimisation

Problem

min − 5×21 + x
2
2

subject to x1 = 1.

Solution: (1, 0)T.

Quadratic penalty function: Q(x ;µ) = −5×21 + x
2
2 +

µ
2

(x1 − 1)2.
Q(x ;µ) is unbounded for µ < 10. The iterates would diverge. Unfortunately, a common problem. M.M. Betcke Numerical Optimisation Ill-conditioning of Hessian Newton step: ∇2xxQ(x ;µk)pn = −∇xQ(x ;µk) ∇2xxQ(x ;µ) = ∇ 2f (x) + p∑ i=1 µkhi (x)︸︷︷︸ ≈ν? i ∇2hi (x) + µk ∇h(x)︸︷︷︸ =:A(x)T ∇h(x)T. If x is sufficiently close to the minimiser of Q(·;µk) ∇2xxQ(x ;µk) ≈ ∇ 2 xxL(x ; ν ?) + µkA(x) TA(x). As µk →∞ the Hessian is dominated by the second term and hence increasingly ill-conditioned. Alternative formulation avoids ill-conditioning, ζ = µkA(x)pn[ ∇2f (x) + ∑p i=1 µkhi (x)∇ 2hi (x) A(x) T A(x) µ−1k I ] [ pn ζ ] = [ −∇xQ(x ;µk) 0 ] . Still, if µkhi (x) is not a good enough approximation to ν ?, inadequate quadratic model yields inadequae search direction pn. M.M. Betcke Numerical Optimisation General constraint problem For general constraint problems including equality and inequality constraints, the quadratic penalty function can be defined as Q(x ;µ) := f (x) + µ 2 p∑ i=1 h2i (x) + µ 2 m∑ i=1 ( [fi (x)] −)2 , where [y ]− := max{y , 0} Note: Q may be less smooth than the objective and constraint functions e.g. f1(x) = x1 ≥ 0, then max{y , 0}2 has discontinuous second derivate and so does Q. M.M. Betcke Numerical Optimisation Practical penalty methods µk can be chosen adaptively based on the difficulty of minimising the penalty function in each iteration i.e. when minimising Q(x ;µk) is expensive, choose µk+1 moderately larger than µk e.g. µk+1 = 1.5µk , when minimising Q(x ;µk) is cheap, choose µk+1 larger e.g. µk+1 = 10µk . There is no guarantee that ‖∇xQ(x ;µk)‖ ≤ τk will be satisfied. Practical implementations need safe guards to increase µ (and possibly restore the initial point) when constraint violation is not decreasing fast enough or when the iterates appear diverging. M.M. Betcke Numerical Optimisation When only equality constraints are present, Q(x ;µk) is smooth and algorithms for unconstraint optimisation can be used, however Q(x ;µk) becomes more difficult to minimise as µk becomes large unless special techniques are used. In particular, methods like conjugate gradients and quasi-Newton will perform poorly, Newton method can be adapted (see the reformulation of Newton step) but it still can yield inadequate direction due to inadequacy of quadratic approximation. Choice of initial point e.g. warm start x sk+1 = xk can improve performance of Newton. M.M. Betcke Numerical Optimisation Nonsmooth penalty functions Some penalty functions are exact i.e. for certain choices of penalty parameters, minimisation w.r.t. x yields the exact minimiser of f . To be exact the function has to be nonsmooth. Q1(x ;µ) := f (x) + µ p∑ i=1 |hi (x)|+ µ m∑ i=1 [fi (x)] −, where [y ]− := max{y , 0}. Let x? be a strict local minimiser of (COP), which satisfies the 1st order necessary conditions with Lagrange multipliers ν?, λ?. Then x? is a local minimiser of Q1(x ;µ) for all µ > µ

? = ‖(ν?, λ?)T‖∞.
If moreover, the 2nd order sufficient conditions hold at µ > µ?,
then x? is a strict local minimiser or Q1(x ;µ).

Let x̂ be a stationary point of the penalty function Q1(x ;µ) for all
µ > µ̂ > 0. Then, if x̂ is feasible for (COP), it satisfies KKT
conditions. If x̂ is not feasible for (COP), it is an infeasible
stationary point.

M.M. Betcke Numerical Optimisation

Example revisited

min x1 + x2

subject to x21 + x
2
2 − 2 = 0.

Solution: (−1,−1)T.

`1 penalty function: Q1(x ;µ) = x1 + x2 + µ|x21 + x
2
2 − 2|.

Q(x; y;7); 7 = 2

-1.5 -1 -0.5 0 0.5 1 1.5

x
2

-1.5

-1

-0.5

0.5

1.5

-1

5
Q(x; y;7); 7 = 5

-1.5 -1 -0.5 0 0.5 1 1.5

x
2

-1.5

-1

-0.5

0.5

1.5

-1

M.M. Betcke Numerical Optimisation

Augmented Lagrangian

Reduces ill-conditioning by introducing explicit Lagrange multiplier
estimates into the function to be minimised.
Can preserve smoothness. Can be implemented using standard
unconstrained (or bound constrained) optimization.

Motivation: The minimisers xk of Q(x ;µk) do not quite satisfy
the feasibility condition hi (x) = 0

hi (xk) ≈ ν?/µk , i = 1, . . . , p.

Obviously, in the limit µk →∞, hi (x)→ 0 but can we avoid this
systematic perturbation for moderate values of µk?

Augmented Lagrangian:

LA(x , ν;µ) := f (x) +
p∑

i=1

νhi (x) +
µ

p∑
i=1

h2i (x).

M.M. Betcke Numerical Optimisation

Update of Lagrange multiplier estimate

Optimality condition for the unconstraint minimiser of
LA(x , νk ;µk)

0 ≈ ∇xLA(xk , νk ;µk) = ∇f (xk) +
p∑

i=1

[νki + µkhi (xk)]∇hi (xk).

Optimality condition for the Lagrangian of (COP:E)

0 ≈ ∇xL(xk , ν?) = ∇f (xk) +
p∑

i=1

ν?i ∇hi (xk).

Comparison yields (an update scheme for ν):

ν? ≈ νk + µkhi (xk), i = 1, . . . , p

as from hi (xk) =
1
µk

(ν?i − ν
k
i ), i = 1, . . . p we see that if ν

k is
close to ν? the infeasibility goes to 0 faster than 1/µk .

M.M. Betcke Numerical Optimisation

Example revisited

min x1 + x2

subject to x21 + x
2
2 − 2 = 0.

Solution: (−1,−1)T.

Augmented Lagrangian:
L(x , ν;µ) = x1 + x2 + ν(x21 + x

2
2 − 2) +

µ
2

(x21 + x
2
2 − 2)

Q(x; y;7); 7 = 1

-1.5 -1 -0.5 0 0.5 1 1.5

x
2

-1.5

-1

-0.5

0.5

1.5

-1

Q(x; y;7); 7 = 10

-1.5 -1 -0.5 0 0.5 1 1.5

x
2

-1.5

-1

-0.5

0.5

1.5

Figure: ν = 0.4

M.M. Betcke Numerical Optimisation

Convergence

Let x? be a local minimiser of (COP:E) at which the constraint
gradients are linearly independent and which satisfies the 2nd order
sufficient conditions with Lagrange multipliers ν?. Then for all
µ ≥ µ̄ > 0, x? is a strict local minimiser of LA(x , ν?;µ).
Furthermore, there exist δ, �,M > 0 such that for all νk , µk
satisfying

‖νk − ν?‖ ≤ µkδ, µk ≥ µ̄,

the problem minLA(x , νk ;µk), subject to ‖x − x?‖ ≤ �, has a
unique solution xk and it holds

‖xk − x?‖ ≤ M‖νk − ν?‖/µk
it holds

‖νk+1 − ν?‖ ≤ M‖νk − ν?‖/µk ,
where νk+1 = νk + µkh(xk).
the matrix ∇2xxLA(xk , νk ;µk) is positive definite and the
constraint gradients ∇hi (xk), i = 1, . . . , p are linearly
independent.

M.M. Betcke Numerical Optimisation

Practical Augmented Lagrangian methods

Bound constraint formulation: convert inequality
constraints into equality constraints using slack variables

fi (x)− si = 0, si ≥ 0, i ∈ {1, . . . .m}.

Bound constraints are not transformed. Solve by projected
gradient algorithm

xk+1 = P(xk −∇x LA(x , ν;µ)|xk ; l , u) = 0,

where P(·; l , u) projects on the box [l , u].

M.M. Betcke Numerical Optimisation

Linearly constraint formulation: transform into equality
constraint problem and linearise constraints

minFk(x), subject to fi (xk)+∇f Ti (xk)(x−xk) = 0, l ≤ x ≤ u.

Choose Fk as

Fk(x) = f (x) +
m∑
i=1

νki f̄
k
i (x),

where

f̄ ki (x) = fi (x)− fi (xk)−∇fi (xk)
T(x − xk).

Preferred choice (larger convergence radius in practise)

Fk(x) = f (x) +
m∑
i=1

νki f̄
k
i (x) +

m∑
i=1

(f̄ ki (x))
2

Unconstraint formulation: obtain unconstraint formulation
using smooth approximation to feasibility set indicator
function.

M.M. Betcke Numerical Optimisation

Related Posts