Numerical Optimisation: Constraint Optimisation
Numerical Optimisation:
Constraint Optimisation
Marta M. Betcke
m.betcke@ucl.ac.uk,
Kiko Rullan
f.rullan@cs.ucl.ac.uk
Department of Computer Science,
Centre for Medical Image Computing,
Centre for Inverse Problems
University College London
Lecture 12
M.M. Betcke Numerical Optimisation
Constraint optimisation problem
min
x∈Rn
f (x) subject to
{
ci (x) = 0, i ∈ E
ci (x) ≥ 0, i ∈ I
(COP)
f : D ⊂ Rn → R: objective function, assume smooth
ci : Rn → R: constraint function, assume smooth
i ∈ E equality constraints,
i ∈ I inequality constraints.
x ∈ Rn: optimisation variable
Feasible set Ω is a set of all points satisfying the constraints
Ω = {x ∈ D : ci (x) = 0, i ∈ E ; ci ≥ 0, i ∈ I}.
Optimal value: x? = infx∈Ω f (x)
x? =∞ if (COP) is infeasible i.e. Ω = ∅
x? = −∞ if (COP) is unbounded below
M.M. Betcke Numerical Optimisation
Examples: smooth constraints
Smooth constraints can describe regions with kinks.
Example: 1-norm:
‖x‖1 = |x1|+ |x2| ≤ 1
can be described as
x1 + x2 ≤ 1, x1 − x2 ≤ 1, −x1 + x2 ≤ 1, −x1 − x2 ≤ 1.
Example: pointwise max
min f (x) = max(x2, x)
can be reformulated as
min t, s.t. t ≥ x , t ≥ x2.
M.M. Betcke Numerical Optimisation
Types of minimisers of constraint problems
A point x? ∈ Ω is a global minimiser if
f (x?) ≤ f (x), ∀x ∈ Ω.
A point x? ∈ Ω is a local minimiser if
∃N (x?) : f (x?) ≤ f (x), ∀x ∈ N (x?) ∩ Ω.
A point x? ∈ Ω is a strict (or strong) local minimiser if
∃N (x?) : f (x?) < f (x), ∀x ∈ N (x?) ∩ Ω, x 6= x?. A point x? ∈ Ω is an isolated local minimiser if ∃N (x?) : x? is the only local minimiser in N (x?) ∩ Ω. M.M. Betcke Numerical Optimisation Active set Feasibility problem: Find x such that all constraints are satisfied at x . Active set A(x) at any feasible x ∈ Ω consists of the equality constraint indices set E and the inequality constraints i ∈ I for which ci (x) = 0 A(x) = E ∪ {i ∈ I : ci (x) = 0}. At a feasible point x ∈ Ω, the inequality constraint i ∈ I is said to be active if ci (x) = 0 and inactive if the strict inequality holds ci (x) > 0.
M.M. Betcke Numerical Optimisation
Single equality constraint
min x1 + x2 s.t. x
2
1 + x
2
2 − 2 = 0.
Feasibility: (Taylor expan-
sion of c1)
0 = c1(x+s) ≈ c1(x)︸ ︷︷ ︸
=0
+∇c1(x)Ts
Decrease direction: (Taylor
expansion of f )
0 > f (x+s)−f (x) ≈ ∇f (x)Ts
The only situation that such s does not exist is if for some scalar λ1
∇f (x) = λ1∇c1(x).
M.M. Betcke Numerical Optimisation
Single inequality constraint
min x1 + x2 s.t. 2− x21 − x
2
2 ≥ 0.
Feasibility: (Taylor expan-
sion of c1)
0 ≤ c1(x+s) ≈ c1(x)+∇c1(x)Ts
Decrease direction: (Taylor
expansion of f )
0 > f (x+s)−f (x) ≈ ∇f (x)Ts
Case: x inside the circle, i.e. c1(x) > 0
s = −α∇f (x)
M.M. Betcke Numerical Optimisation
Single inequality constraint
min x1 + x2 s.t. 2− x21 − x
2
2 ≥ 0.
Feasibility: (Taylor expan-
sion of c1)
0 ≤ c1(x+s) ≈ c1(x)︸ ︷︷ ︸
=0
+∇c1(x)Ts
Decrease direction: (Taylor
expansion of f )
0 > f (x+s)−f (x) ≈ ∇f (x)Ts
Case: x on the boundary of the circle, i.e. c1(x) = 0
∇f (x)Ts < 0, ∇c1(x)Ts ≥ 0 Empty only if ∇f (x) = λ1∇c1(x) for some λ1 ≥ 0. M.M. Betcke Numerical Optimisation Linear independent constraint qualification (LICQ) Given the point x in the active set A(x), the linear independent constraint qualification (LICQ) holds if the set of of active constraint gradients {∇ci (x), i ∈ A(x)} is linearly independent. Note that for LICQ to be satisfied, none the the active constraint gradients can be 0. Example: LICQ is not satisfied if we define the equality constraint c1(x 2 2 + x 2 2 − 2) 2 = 0 (same feasibility region, different constraint) There are other constraint qualifications e.g. Slater’s conditions for convex problems. M.M. Betcke Numerical Optimisation Theorem: 1st order necessary conditions Lagrangian function L(x , λ) = f (x)− ∑ i∈E∪I λici (x). Let x? be a local solution of (COP) and f and ci be continuously differentiable and LICQ hold at x?. Then there exists a Lagrange multiplier λ? with components λ?i , i ∈ E ∪ I such that the following Karush-Kuhn-Tucker conditions are satisfied at (x?, λ?) ∇xL(x?, λ?) = 0, (1a) ci (x ?) = 0, ∀i ∈ E , (1b) ci (x ?) ≥ 0, ∀i ∈ I, (1c) λ? ≥ 0, ∀i ∈ I, (1d) λ?i ci (x ?) = 0, ∀i ∈ E ∪ I. (1e) M.M. Betcke Numerical Optimisation Strong complementarity condition The complementarity condition (2)(e) can be made stronger. Given x? a local solution of (COP) and a vector λ? satisfying the KKT conditions (2), the strict complementarity condition holds if exactly one of λ?i and ci (x ?) is zero for each i ∈ I. In other words, λ? > 0 for each i ∈ I ∩ A(x?).
Strict complementarity makes it easier for the algorithms to
identify the active set and converge quickly to the solution.
For a given solution x? of (COP), there may be many vectors λ?
which satisfy the KKT condition (2). However, if LICQ holds, the
optimal λ? is unique.
M.M. Betcke Numerical Optimisation
Lagrangian: primal problem
For convenience we change (and refine) our notation for the
constraint optimisation problem. The following slides are based on
Boyd (Convex Optimization I).
Let p? be the optimal value of the primal problem
min
x∈D⊂Rn
f (x) (COP:P)
subject to fi (x) ≤ 0, i = 1, . . . ,m,
hi (x) = 0, i = 1, . . . , p
The Lagrangian L : D × Rm × Rp → R
L(x , λ, ν) = f (x) +
m∑
i=1
λi fi (x) +
p∑
i=1
νihi (x)
λi are Lagrange multipliers associated with fi (x) ≤ 0
νi are Lagrange multipliers associated with hi (x) = 0
M.M. Betcke Numerical Optimisation
Lagrange dual function
Lagrange dual function: g : Rm × Rp → R
g(λ, ν) = inf
x∈D
L(x , λ, ν) (LD)
= inf
x∈D
(
f (x) +
m∑
i=1
λi fi (x) +
p∑
i=1
νihi (x)
)
.
g : is concave, can be −∞ for some λ, ν.
Lower bound property: If λ ≥ 0, then g(λ, ν) ≤ p?.
Proof: For any feasible x̃ and λ ≥ 0 we have
f (x̃) ≥ L(x̃ , λ, ν) ≥ inf
x∈D
L(x , λ, ν) = g(λ, ν).
Minimising over all feasible x̃ gives p? ≥ g(λ, ν).
M.M. Betcke Numerical Optimisation
Convex problem
Convex problem in standard form
min
x∈D⊂Rn
f (x)
subject to fi (x) ≤ 0, i = 1, . . . ,m,
hi (x) = 0, i = 1, . . . , p
f is convex and D is convex
fi are convex
hi are affine i.e. a
T
i x = bi
Feasibility set Ω of a convex problem is a convex set.
M.M. Betcke Numerical Optimisation
Example: least norm solution of linear equations
min
x∈Rn
xTx
subject to Ax = b
Lagrangian: L(x , ν) = xTx + νT(Ax − b)
Dual function:
g(ν) = infx∈Rn L(x , ν) = infx∈Rn
(
xTx + νT(Ax − b)
)
L(x , ν) is strictly convex in x , to minimise over x set gradient
equal zero:
∇xL(x , ν) = 2x + ATν = 0 ⇒ xmin = −1/2ATν
Plug xmin into g
g(ν) = L(xmin, ν) = −
1
4
νATAν − bTν.
g is a concave function of ν.
Lower bound property: p? ≥ −1/4νATAν − bTν for all ν.
M.M. Betcke Numerical Optimisation
Example: standard form LP
min
x∈Rn
cTx
subject to Ax = b, x ≥ 0
Lagrangian:
L(x , ν) = cTx +νT(Ax−b)−λTx = −bTν+(c+ATν−λ)Tx
Dual function:
g(λ, ν) = inf
x∈Rn
L(x , λ, ν) = inf
x∈Rn
(
−bTν + (c + ATν − λ)Tx
)
L(x , ν) is affine in x , hence
g(λ, ν) = infx∈Rn L(x , ν) =
{
−bTν, ATν − λ+ c = 0
−∞, otherwise
g is linear on affine domain {(λ, ν) : ATν − λ+ c = 0},
hence concave.
Lower bound property: p? ≥ −bTν if ATν + c ≥ 0.
M.M. Betcke Numerical Optimisation
Example: equality constraint norm minimisation
min
x∈Rn
‖x‖
subject to Ax = b
Lagrangian:
L(x , ν) = ‖x‖+ νT(b − Ax) = ‖x‖+ bTν − νTAx
infx∈Rn(‖x‖ − yTx) = 0 if ‖y‖? ≤ 1,−∞ otherwise, where
‖v‖? = sup‖u‖≤1 uTv is dual norm of ‖ · ‖.
If ‖y‖? ≤ 1, then ‖x‖ − yTx ≥ 0, ∀x , with equality if x = 0.
If ‖y‖? > 1, choose x = tu, u : ‖u‖ ≤ 1, uTy = ‖y‖? > 1
‖x‖ − yTx = t(‖u‖ − ‖y‖?)→ −∞ as t →∞
Dual function:
g(ν) = inf
x∈Rn
L(x , ν) =
{
bTν, ‖ATν‖? ≤ 1
−∞, otherwise
Lower bound property: p? ≥ bTν if ‖ATν‖? ≤ 1.
M.M. Betcke Numerical Optimisation
Conjugate function
The conjugate of function f is
f ?(y) = sup
x∈D
(yTx − f (x))
The conjugate f ? is convex (even if f is not)
Figure: Boyd, Convex Optimization I
M.M. Betcke Numerical Optimisation
Lagrange dual and conjugate function
min
x∈Rn
f (x)
subject to Ax ≤ b, Cx = d
Lagrangian:
L(x , λ, ν) = f (x) + λT(Ax − b) + νT(Cx − d)
= f (x) + (ATλ+ CTν)Tx − bTλ− dTν.
Dual function:
g(λ, ν) = inf
x∈D
(
f (x) + (ATλ+ CTν)Tx − bTλ− dTν
)
= − sup
x∈D
(
−f (x)− (ATλ+ CTν)Tx − bTλ− dTν
)
= −f ?(−ATλ− CTν)− bTλ− dTν
M.M. Betcke Numerical Optimisation
Lagrangian: dual problem
Lagrange dual problem
max g(λ, ν)
subject to λ ≥ 0
finds the best lower bound on p? obtained from Lagrange dual
function
is a convex optimization problem, we denote its optimal value
with d?
λ, ν are dual feasible if λ ≥ 0, (λ, ν) ∈ dom g
often simplified by making implicit constraint (λ, ν) ∈ dom g ,
explicit
M.M. Betcke Numerical Optimisation
Weak and strong duality
Weak duality: d? ≤ p?
always holds (for convex and nonconvex problems)
can be used to find nontrivial lower bounds for difficult
problems
Strong duality: d? = p?
does not hold in general
holds for convex problems under constraints qualifications
M.M. Betcke Numerical Optimisation
Slater’s constraint qualification
Strong duality holds for a convex problem
min
x∈D
f (x)
subject to fi (x) ≤ 0, i = 1, . . . ,m
Ax = b,
if it is strictly feasible i.e.
∃x ∈ intD : fi (x) < 0, i = 1, . . . ,m, Ax = b
also guarantees that the dual optimum is attained (if
p? > −∞)
can be sharpened: e.g. can replace intD with relintD (interior
of the affine hull); linear inequalities do not need to hold with
strict inequality, …
other constraint qualifications exist e.g. LICQ
M.M. Betcke Numerical Optimisation
Example: inequality form LP
Primal problem
min
x∈Rn
cTx
subject to Ax ≤ b
Dual function
g(λ) = inf
x∈Rn
(
(c + ATλ)Tx − bTλ
)
=
{
−bTλ, ATλ+ c = 0
−∞, otherwise
Dual problem
max − bTλ
subject to ATλ+ c = 0, λ ≥ 0
From Slater’s condition: p? = d? if ∃x̃ : Ax̃ < b In fact, p? = d? except when primal and dual are infeasible M.M. Betcke Numerical Optimisation Example: Quadratic program Primal problem (assume P symmetric positive definite) min x∈Rn xTPx subject to Ax ≤ b Dual function g(λ) = inf x∈Rn ( xTPx + λT(Ax − b) ) = − 1 4 λTAP−1ATλ− bTλ Dual problem max − 1 4 λTAP−1ATλ− bTλ subject to λ ≥ 0 From Slater’s condition: p? = d? if ∃x̃ : Ax̃ < b In fact, p? = d? always M.M. Betcke Numerical Optimisation Example: nonconvex problem with strong duality Primal problem min x∈Rn xTAx + 2bTx subject to xTx ≤ 1 A 6� 0 is not positive definite. Dual function g(λ) = inf x∈Rn ( xT(A + λI )x + 2bTx − λ ) unbounded below if A + λI 6� 0 or if A + λI � 0 and b 6∈ R(A + λI ) otherwise minimised by x = −(A + λI )†b: g(λ) = −bT(A + λI )†b − λ M.M. Betcke Numerical Optimisation Dual problem max − bT(A + λI )†b − λ subject to A + λI � 0 b ∈ R(A + λI ) and equivalent semidefinite program: max − t − λ subject to [ A + λI b bT t ] � 0 Strong duality although primal problem is not convex (not easy to show). M.M. Betcke Numerical Optimisation KKT conditions revisited Karush-Kuhn-Tucker conditions are satisfied at x?, ν?, λ? i.e. ∇f (x?) + m∑ i=1 λi∇fi (x?) + p∑ i=1 νi∇hi (x?) = 0, (2a) hi (x ?) = 0, i = 1, . . . , p [primary constraints] (2b) fi (x ?) ≤ 0, i = 1, . . . ,m [primary constraints] (2c) λ? ≥ 0, i = 1, . . . ,m [dual constraints] (2d) λ?i fi (x ?) = 0, i = 1, . . . ,m [complementary slackness] (2e) Necessary condition: If strong duality holds and x , ν, λ are optimal, then they must satisfy KKT conditions. For any problem for which strong duality holds, KKT are necessary conditions. M.M. Betcke Numerical Optimisation KKT conditions for convex problem Sufficient condition: If x?, ν?, λ? satisfy KKT conditions and the problem is convex, then x?, ν?, λ? are primal and dual optimal: from complementary slackness: f (x?) = f (x?) + ∑m i=1 λi fi (x ?)︸ ︷︷ ︸ =0 + ∑p i=1 νi hi (x ?)︸ ︷︷ ︸ =0 = L(x?, λ?, ν?) g(λ?, ν?) = infx L(x , λ?, ν?) and from the 1st order necessary condition and convexity of f we have that the minimum is attained at x?, hence g(λ?, ν?) = L(x?, λ?, ν?) Thus it follows that f (x?) = g(λ?, ν?). If Slater’s condition is satisfied: x? is optimal if and only if there exists λ?, ν? that satisfy KKT conditions recall that Slater implies strong duality, and that the dual optimum is attained generalises optimality condition ∇f (x) = 0 for unconstrained problem M.M. Betcke Numerical Optimisation