Numerical Optimisation Constraint optimisation: Interior point methods
Numerical Optimisation
Constraint optimisation:
Interior point methods
Marta M. Betcke
m.betcke@ucl.ac.uk,
Kiko Rullan
f.rullan@cs.ucl.ac.uk
Department of Computer Science,
Centre for Medical Image Computing,
Centre for Inverse Problems
University College London
Lecture 15
M.M. Betcke Numerical Optimisation
Convex constraint optimisation problem
Convex constraint optimization problem
min
x∈D⊂Rn
f (x) (COP)
subject to fi (x) ≤ 0, i = 1, . . . ,m,
Ax = b,
where
f : D → R is convex, twice continuously differentiable
function, D ⊂ Rn is convex
fi : Rn → R, i = 1, . . .m are convex, twice continuously
differentiable functions
A ∈ Rp×n with rankA = p < n. M.M. Betcke Numerical Optimisation We assume that (COP) is solvable i.e. an optimal x? exists, and we denote the optimal value as p? = f (x?). (COP) is strictly feasible i.e. there exists x ∈ D that satisfies Ax = b and fi (x) < 0 for i = 1, . . . ,m. This means that Slater’s constraint qualification holds, thus there exists dual optimal λ? ∈ Rm, ν? ∈ Rp, which together with x? satisfy the KKT conditions ∇f (x?) + m∑ i=1 λ?i∇fi (x ?) + ATν? = 0, (KKT) Ax? = b, fi (x ?) ≤ 0, i = 1, . . . ,m, λ? ≥ 0, λ?i fi (x ?) = 0, i = 1, . . . .m. M.M. Betcke Numerical Optimisation Interior point methods Interior point methods solve either the problem (COP) by applying Newton’s method to a sequence of equality constraint problems. the conditions (KKT) by applying Newton’s method to a sequence of modified versions of the KKT conditions. We will consider the barrier method and the primal-dual interior-point method. M.M. Betcke Numerical Optimisation Logarithmic barrier function Rewrite (COP) making the inequality constraints implicit min x∈D⊂Rn f (x) + m∑ i=1 I−(fi (x)) subject to Ax = b, where I− : R→ R is the indication function for the nonpositive reals I−(u) = { 0 u ≤ 0, ∞ u > 0.
I− is non-differentiable thus we need a smooth approximation
before Newton method can be applied.
M.M. Betcke Numerical Optimisation
Approximate I− with a smooth logarithmic barrier
Î−(u) = −1/t log(−u), dom Î− = [−∞, 0),
where t > 0 is a parameter that sets the accuracy of the
approximation.
Like I−, Î− is convex, nondecreasing and by convention ∞ for
u > 0.
Unlike I−, Î− is differentiable and closed i.e. it increases to ∞
as u increases to 0.
-3 -2 -1 0 1
-2
0
2
4
6
8
10
Î! = !1=t log(!u)
I!
Î!; t = 1=2
Î!; t = 1
Î!; t = 2
M.M. Betcke Numerical Optimisation
Substituting Î− for I− yields an approximation
min
x∈D⊂Rn
f (x) +
m∑
i=1
−1/t log(−fi (x))
subject to Ax = b.
The objective function is convex since −1/t log(−u) is increasing
in u, and differentiable, thus Newton’s method can be applied.
Logarithmic barrier
φ(x) = −
m∑
i=1
log(−fi (x)), domφ = {x ∈ Rn : fi (x) < 0, i = 1, . . . ,m}
Gradient and Hessian of φ
∇φ(x) =
m∑
i=1
1
−fi (x)
∇fi (x),
∇2φ(x) =
m∑
i=1
1
fi (x)2
∇fi (x)∇fi (x)T +
m∑
i=1
1
−fi (x)
∇2fi (x)
M.M. Betcke Numerical Optimisation
Central path
Consider the equivalent problem
min
x∈D⊂Rn
tf (x) + φ(x) (CENT)
subject to Ax = b.
We assume that (CENT) has a unique solution for each t > 0, and
denote this solution with x?(t).
The set of points x?(t), t > 0 is called the central path. The
points on central path are characterised by the following necessary
and sufficient centrality conditions:
x?(t) is strictly feasible i.e. satisfies
Ax?(t) = b, fi (x
?(t)) < 0, i = 1, . . . ,m,
and there exists a ν̂ ∈ Rp such that
0 = t∇f (x?(t)) +∇φ(x?(t)) + ATν̂ (CENT:COND)
= t∇f (x?(t)) +
m∑
i=1
1
−fi (x?(t))
∇f (x?(t)) + ATν̂
M.M. Betcke Numerical Optimisation
Example: LP with inequality constraints
min
x∈Rn
cTx
subject to Ax ≤ b.
The logarithmic barrier:
φ(x) = −
m∑
i=1
log(bi − aTi︸︷︷︸
=:Ai,:
x), domφ = {x : Ax < b}.
The gradient and Hessian:
∇φ(x) =
m∑
i=1
1
bi − aTi x
ai , ∇2φ(x) =
m∑
i=1
1
(bi − aTi x)2
aia
T
i .
Since x is strictly feasible, we have bi − aTi x > 0 and the Hessian is
nonsingular iff A has rank n.
The centrality condition (CENT:COND): (∇φ(x?(t)) ‖ −c)
tc +
m∑
i=1
1
bi − aTi x
ai = 0.
M.M. Betcke Numerical Optimisation
Example: central path
Figure: Boyd Vandenberghe Fig. 11.2
M.M. Betcke Numerical Optimisation
Dual points from central path
Claim: Every point on central path yields a dual feasible point and
hence a lower bound on p?. More precisely, the pair
λ?(t) =
1
−tfi (x?(t))
, i = 1, . . . ,m, ν?(t) = ν̂/t
is dual feasible.
Proof: λ?(t) > 0 because x?(t) is strictly feasible fi (x
?(t)) < 0
and from optimality conditions (CENT:COND) we read out that
x?(t) minimises the Lagrangian
L(x , λ, ν) = f (x) +
m∑
i=1
λi fi (x) + ν
T(Ax − b)
for λ = λ?(t), ν = ν?(t).
M.M. Betcke Numerical Optimisation
This means that λ?(t), ν?(t) are dual feasible, the dual function is
finite and
g(λ?(t), ν?(t)) =f (x?(t)) +
m∑
i=1
λ?i (t)︸ ︷︷ ︸
=− 1
tfi (x
?(t))
fi (x
?(t))
+ ν?(t)T(Ax?(t)− b︸ ︷︷ ︸
=0
)
=f (x?(t))−m/t → duality gap
thus x?(t) is not more than m/t suboptimal
f (x?(t))− p? ≤ m/t
and x?(t) converges to an optimal point as t →∞.
M.M. Betcke Numerical Optimisation
Interpretation via KKT conditions
We can interpret the central path conditions as a continuous
deformation of (KKT). A point x is equal to x?(t) iff there exists
λ, ν such that
∇f (x) +
m∑
i=1
λi∇fi (x) + ATν = 0, (KKT:CENT)
Ax = b,
fi (x) ≤ 0, i = 1, . . . ,m,
λ ≥ 0,
−λi fi (x) = 1/t, i = 1, . . . .m.
The only difference to (KKT) is the complementarity condition
−λi fi (x) = 1/t. For large t, x?(t), λ?(t), ν?(t) almost satisfy the
KKT conditions.
M.M. Betcke Numerical Optimisation
Newton for centering problem (CENT)
The Newton step for the centering problem (CENT) (linear
equality constraint problem) reads[
t∇2f (x) +∇2φ(x) AT
A 0
] [
∆xn
νn
]
= −
[
t∇f (x) +∇φ(x)
0
]
.
Here we assumed feasibility.
We can interpret these Newton step for (CENT) as Newton for
directly solving the modified (KKT:CENT) in a particular way.
M.M. Betcke Numerical Optimisation
Newton for modified KKT (KKT:CENT)
First, eliminate λ using λi = −1/(tfi (x)) from the (KKT:CENT)
system
∇f (x) +
m∑
i=1
1
−tfi (x)
∇fi (x)︸ ︷︷ ︸
= 1
t
∇φ(x)
+ATν = 0, Ax = b.
To find the Newton step for the solution of the nonlinear equations
above, we form the Taylor expansion for the nonlinear term
∇f (x + v) +
1
t
∇φ(x + v)
≈ ∇f (x) +
1
t
∇φ(x)︸ ︷︷ ︸
=:g
+
(
∇2f (x) +
1
t
∇2φ(x)
)
v︸ ︷︷ ︸
=:Hv
.
M.M. Betcke Numerical Optimisation
Replace the nonlinear term with this linear approximation
Hv + ATν = −g , Av = 0
and observe that the Newton step ∆xn, νn for (CENT) satisfies
tH∆xn + A
Tνn = −tg , A∆xn = 0.
Comparing to the Newton step for (KKT:CENT) yields
v = ∆xn, ν = (1/t)νn.
This shows that the Newton step for the centring problem (CENT)
can be interpreted (after scaling of the dual variable) as the
Newton step for solving the modified (KKT:CENT) system.
M.M. Betcke Numerical Optimisation
The barrier method
Require: Strictly feasible x s := x (0), t := t(0) > 0, µ > 1
Require: Tolerance � > 0
1: loop
2: Centering step:
obtain x?(t) by minimising (CENT) starting from x s
3: Update x s = x?(t)
4: if m/t < � then
5: break {stopping criterium �-sub optimal point}
6: end if
7: Increase t = µt
8: end loop
M.M. Betcke Numerical Optimisation
Barrier method: remarks
Centering step: can be solved by any methods for linearly
constraint minimisation, in particular Newton method.
Exact centering is not necessary since the central path has no
significance beyond that it leads to the solution of the original
problem (COP) as t →∞.
Inexact centering will still produce a convergent sequence,
however the λ?(t), ν?(t) are not exactly dual feasible (can be
corrected).
The difference in cost between the exact and good
approximate centering is marginal (few Newton steps), so
centering is usually assumed exact.
M.M. Betcke Numerical Optimisation
Choice of µ: trade of between the number of outer
(centering) and inner (Newton) iterations.
Small µ ≈ 1: good initial guess for Newton i.e. small number
of inner iterations closely following the central path but a large
number of outer iterations to reach desire accuracy �.
Larger µ: only a few outer iterations but with a large number
of inner iterations straying from the central path.
In practice for a large range of µ ∈ (3, 100) these effects
balance each other yielding approximately same total number
of Newton iterations. Values around 10-20 seem to work well.
For best worse-case bound on total Newton steps set µ close
to 1.
M.M. Betcke Numerical Optimisation
Choice of t(0): Trade of between the number of inner
iterations in the first step and number of outer iterations.
Choose so that m/t(0) ≈ f (x (0))− p?. For instance if a dual
feasible point λ, ν is known with the duality gap
η = f (x (0))− g(λ, ν), then we can set t(0) = m/η (the first
centering step will compute a pair with the same duality gap
as the initial primal and dual feasible points).
Choose t(0) as a minimiser of
inf
t,ν
‖t∇f (x (0)) +∇φ(x (0)) + ATν‖,
a measure of deviation of x (0) from x?(t) (least squares
problem for t, ν).
Infeasible Newton method for centering step. Choose
x (0) ∈ D, fi (x (0)) < 0, i = 1, . . . ,m but not necessarily
Ax (0) = b. Assuming the centering problem is strictly feasible,
a full Newton step is taken at some point during the first
centering step and thereafter the iterates are primal feasible
and the algorithm coincides with the standard barrier method.
M.M. Betcke Numerical Optimisation
Computing a strictly feasible point
The barrier method requires a strictly feasible point x (0). When
such a point is not known, the barrier method is preceded by a
preliminary stage called phase I to compute a strictly feasible point
(or to find that the constraints are infeasible).
Consider the set of inequalities and equalities
fi (x) ≤ 0, i = 1, . . . ,m, Ax = b (FEAS)
Assume we have a point x (0) ∈
∏m
i=1 dom fi and Ax
(0) = b i.e. the
inequalities are possibly not satisfied at x (0).
M.M. Betcke Numerical Optimisation
Phase I: max
Goal: find a strictly feasible solution of equalities and inequalities:
min s (PH1:MAX)
subject to fi (x) ≤ s, i = 1, . . . ,m
Ax = b
s: bound on the the maximum infeasibility of the inequalities. The
goal is to drive this maximum below 0.
The problem (PH1:MAX) is always strictly feasible. Thus we can
initialise with x = x (0) and for s with any number larger than
maxi=1,....m fi (x
(0)) and apply the barrier method.
M.M. Betcke Numerical Optimisation
Let p?I denote the optimal value for (PH1:MAX).
p?I < 0: (FEAS) has a strictly feasible solution.
If (x , s) is feasible for (PH1:MAX) with s < 0, then x satisfies
fi (x) < 0.
We do not need to solve (PH1:MAX) with high accuracy, we
can terminate when s < 0.
p?I > 0: (FEAS) are infeasible.
We do not need to solve (PH1:MAX) with high accuracy, we
can terminate when a dual feasible point is found with
positive objective, which proves that p?I > 0.
If p?I = 0 and the minimum is attained at x
? and s? = 0, then
the set of inequalities is feasible, but not strictly feasible. If
p? = 0 and the minimum is not attained, the inequalities are
infeasible.
M.M. Betcke Numerical Optimisation
Phase I: sum
min 1Ts (PH1:SUM)
subject to fi (x) ≤ s, i = 1, . . . ,m
Ax = b
s ≥ 0
For a fixed x , the optimal value of si is max{fi (x), 0}. Thus
we are minimising a sum of infeasibilities.
The optimal value is 0 and achieved iff the original set of
equalities and inequalities is feasible.
When the system of equalities and inequalities is infeasible,
often the solution violates only a small number of constraints
i.e. we identified a large feasible subset. This is more
informative than finding that m inequalities together are
mutually infeasible.
M.M. Betcke Numerical Optimisation
Termination near phase II central path
Assume x (0) ∈ D ∩
∏m
i=1 domfi with Ax
(0) = b.
Modified phase I optimisation problem
min s
subject to fi (x) ≤ s, i = 1, . . . ,m
f (x) ≤ M
Ax = b
with M > max{f (x (0)), p?}.
Central path for this modified problem (x?(t̄), s?(t̄)), t̄ > 0
m∑
i=1
1
s − fi (x)
= t̄,
1
M − f (x)
∇f (x) +
m∑
i=1
1
s − fi (x)
∇fi (x) + ATν = 0.
If (x , s) with s = 0 is on this central path, it is also on the central
path for (COP) if the latter is strictly feasible (s = 0)
t∇f (x) +
m∑
i=1
1
−fi (x)
∇fi (x) + ATν = 0
with t = 1/(M − f (x)).
M.M. Betcke Numerical Optimisation
Phase I via infeasible Newton
We expresse (COP) in equivalent form
min f (x)
subject to fi (x) ≤ s, i = 1, . . . ,m
Ax = b, s = 0.
Start the barrier method using infeasible Newton method so solve
min tf (x)−
m∑
i=1
log(s − fi (x))
subject to Ax = b, s = 0,
which can be initialised with any x ∈ D and any s > max
i
fi (x)︸ ︷︷ ︸
infeasibility
.
Provided the problem is strictly feasible, the infeasible Newton will
eventually take an undamped step an thereafter we will have s = 0
i.e. x strictly feasible.
M.M. Betcke Numerical Optimisation
Finding a point in the domain D
The same trick can be applied if a point in D ∩
∏m
i=1 domfi
(domain of the function and inequality constraints) is not known.
Apply infeasible Newton to
min tf (x + z0)−
m∑
i=1
log(s − fi (x + zi ))
subject to Ax = b, s = 0, z0 = 0, z1 = 0, . . . , zm = 0,
with initialisation zi : x + zi ∈ dom fi .
Disadvantage: no good stopping criterion for infeasible problems;
the residual simply fails to converge to 0.
M.M. Betcke Numerical Optimisation
Characteristic performance
Typically the cost of solving a set of convex inequalities and
linear equalities using the barrier method is modest, and
approximately constant, as long as the problem is not very
close to the boundary between feasibility and infeasibility.
When the problem is very close to the boundary, the number
of Newton steps required to find a strictly feasible point or
produce a certificate of infeasibility grows.
When the problem is exactly on the boundary between strictly
feasible and infeasible, for example, feasible but not strictly
feasible, the cost becomes infinite.
Typically the infeasible start Newton method works very well
provided the inequalities are feasible, and not very close to the
boundary between feasible and infeasible.
When the feasible set is just barely nonempty, a phase I
method is far better choice. Phase I method gracefully
handles the infeasible case; the infeasible start Newton
method, in contrast, simply fails to converge.
M.M. Betcke Numerical Optimisation
Primal-dual interior point method
Primal-dual interior point method is similar to barrier method with
key differences:
There is only one loop or iteration, i.e., there is no distinction
between inner and outer iterations as in the barrier method.
At each iteration, both the primal and dual variables are
updated.
The search directions in a primal-dual interior-point method
are obtained from Newton’s method, applied to modified KKT
equations (i.e., the optimality conditions for the logarithmic
barrier centering problem). The primal-dual search directions
are similar to, but not quite the same as, the search directions
that arise in the barrier method.
In a primal-dual interior-point method, the primal and dual
iterates are not necessarily feasible.
Usually more efficient than barrier methods, and do not
require strict feasibility.
M.M. Betcke Numerical Optimisation
Primal-dual search direction
As in barrier method we start from (KKT:CENT) which we rewrite
in the form
0 = rt(x , λ, ν) =
∇f (x) + J(x)Tλ+ ATν− diag(λ)F (x)− (1/t)1
Ax − b
=:
rdualrcent
rprim
,
with t > 0, F : Rn → Rm and its Jacobian
F (x) =
f1(x)…
fm(x)
, J(x) = DF (x) =
∇f1(x)
T
…
∇fm(x)T
.
If x , λ, ν satisfy rt(x , λ, ν) = 0 (and fi (x) < 0), then
x = x?(t), λ = λ?(t), ν = ν?(t). In particular, x is primal feasible,
and λ, ν are dual feasible, with duality gap m/t.
M.M. Betcke Numerical Optimisation
Newton step for solution of rt(x , λ, ν) = 0 at y = (x , λ, ν) a
primal-dual strictly feasible point F (x) < 0, λ > 0.
Difference to barrier method: we do not eliminate λ before taking
the Newton step
rt(y + ∆y) ≈ rt(y) + Drt(y)∆y = 0,
where ∆y = (∆x ,∆λ,∆ν) is the primal-dual search direction.
Written in terms of x , λ, ν:
∇2f (x) +
∑m
i=1 λi∇
2fi (x) J(x)
T AT
− diag(λ)J(x) − diag(F (x)) 0
A 0 0
∆x∆λ
∆ν
= −
rdualrcent
rprim
.
(PD:N)
M.M. Betcke Numerical Optimisation
Comparison of primal-dual and barrier search directions
Eliminate ∆λpd from (PD:N):
From the second block
∆λpd = − diag(F (x))−1 diag(λ)J(x)∆xpd + diag(F (x))−1rcent
and substitute into the first block[
Hpd A
T
A 0
] [
∆xpd
∆νpd
]
= −
[
rdual + J(x)
T diag(f (x))−1rcent
rpri
]
= −
[
∇f (x) + 1
t
∑m
i=1
1
−fi (x)
∇fi (x) + ATν
rpri
]
,
where
Hpd = ∇2f (x) +
m∑
i=1
λi∇2fi (x) +
m∑
i=1
λi
−fi (x)
∇fi (x)∇fi (x)T.
M.M. Betcke Numerical Optimisation
Compare to the Newton step in the barrier method (in the
infeasible form)[
Hbar A
T
A 0
] [
∆xbar
νbar
]
= −
[
t∇f (x) +
∑m
i=1
1
−fi (x)
∇fi (x)
rpri
]
,
where
Hbar = t∇2f (x) +
m∑
i=1
1
−fi (x)
∇2fi (x) +
m∑
i=1
1
f 2i (x)
∇fi (x)∇fi (x)T.
Multiplying first block by 1/t and changing variables
∆νbar = (1/t)νbar − ν[
1
t
Hbar A
T
A 0
] [
∆xbar
∆νbar
]
= −
[
∇f (x) + 1
t
∑m
i=1
1
−fi (x)
∇fi (x) + ATν
rpri
]
,
M.M. Betcke Numerical Optimisation
The right hand sides are identical.
The only difference are
Hpd = ∇2f (x) +
m∑
i=1
λi∇2fi (x) +
m∑
i=1
λi
−fi (x)
∇fi (x)∇fi (x)T.
1
t
Hbar = ∇2f (x) +
m∑
i=1
1
−tfi (x)
∇2fi (x) +
m∑
i=1
1
tf 2i (x)
∇fi (x)∇fi (x)T.
When x , λ, ν satisfy −fi (x)λi = 1/t, the coefficient matrices (and
hence directions) coincide.
M.M. Betcke Numerical Optimisation
The surrogate duality gap
In the primal-dual interior point methods, the iterates
x (k), λ(k), ν(k) are not necessarily feasible, except in the limit
as the algorithm converges.
Hence, cannot easily evaluate duality gap η(k) in the kth step,
as we do in the outer loop of the barrier method.
Instead, we define the surrogate duality gap, for any x that
satisfied F (x) < 0 and λ ≥ 0 as
η(x , λ) = −F (x)Tλ.
The surrogate gap is the duality gap if x were primal feasible
and λ, µ were dual feasible i.e. if rprim = 0, rdual = 0. Note
that value of t corresponds to the surrogate duality gap
η ≈ m/t → t = m/η.
M.M. Betcke Numerical Optimisation
Primal-dual interior point
Require: x that satisfies F (x) < 0, λ > 0
Require: µ > 1
Require: Tolerances �feas > 0, � > 0
1: repeat
2: Determine t: t := µm/η
3: Compute primal-dual search direction ∆ypd
4: Line search: determine step length s > 0 and set
y := y + s∆ypd
5: until ‖rprim‖ ≤ �feas, ‖rdual‖ ≤ �feas and η ≤ �
M.M. Betcke Numerical Optimisation
Remarks
The parameter t is set to a factor µm/η, which is the value of
t associated with the current surrogate duality gap η. If
x , λ, ν were central, with parameter t (and therefore with
duality gap m/t), then we would increase t by the factor µ (as
in the barrier method).
Values of the parameter µ on the order of 10 appear to work
well.
The primal-dual interior-point algorithm terminates when x is
primal feasible and λ, ν are dual feasible (within the tolerance
�feas) and the surrogate gap is smaller than the tolerance �.
Since the primal-dual interior-point method often has faster
than linear convergence, it is common to choose �feas, � small.
M.M. Betcke Numerical Optimisation
The line search in the primal-dual interior point method is a
standard backtracking line search, based on the norm of the
residual, and modified to ensure that λ > 0 and F (x) < 0.
Start with smax = sup{s ∈ [0, 1] : λ+ s∆λ ≥ 0}, multiply by
ρ ∈ (0, 1) until F (x + s∆xpd) < 0. Continue multiplying until
we have
‖rt(x+s∆xpd, λ+s∆λpd, x+s∆νpd)‖ ≤ (1−αs)‖rt(x , λ, ν)‖.
Common choices for backtracking parameters are same as for
Newton method α in the range 0.01 to 0.1 and ρ 0.3 to 0.8.
One iteration of the primal-dual interior-point algorithm is the
same as one step of the infeasible Newton method, applied to
solving rt(x , λ, ν) = 0, but modified to ensure λ > 0 and
F (x) < 0 (or, equivalently, with domrt restricted to λ > 0 and
F (x) < 0). The same arguments used in the proof of
convergence of the infeasible start Newton method show that
the line search for the primal-dual method always terminates
in a finite number of steps.
M.M. Betcke Numerical Optimisation