Numerical Optimisation: Background
Numerical Optimisation:
Background
Marta M. Betcke
m.betcke@ucl.ac.uk,
Kiko Rullan
f.rullan@cs.ucl.ac.uk
Department of Computer Science,
Centre for Medical Image Computing,
Centre for Inverse Problems
University College London
Lecture 1
M.M. Betcke Numerical Optimisation
Mathematical opimisation problem
min
x∈Rn
f (x) subject to
ci (x) = 0, i ∈ E
ci (x) ≥ 0, i ∈ I
f : Rn → R: objective function
ci : Rn → R: constraint function,
i ∈ E equality constraints,
i ∈ I inequality constraints.
x ∈ Rn: optimisation variable
Optimal solution x? has the smallest value of f among all x
which satisfy the constraints.
M.M. Betcke Numerical Optimisation
Example: geodesics
Geodesics are the shortest surface paths between two points.
Figure: https://academo.org/demos/geodesics/
M.M. Betcke Numerical Optimisation
A very short and incomplete early history
Source http://www.mitrikitti.fi/opthist.html
Antiquity: geometrical optimisation problems
300 BC Euclid considers the minimal distance between a point
a line, and proves that a square has the greatest area among
the rectangles with given total length of edges
before Calculus of Variations: isolated optimization
problems
1615 J. Kepler: optimal dimensions of wine barrel (with
smallest variation of volume w.r.t. barrel parameters).1
Early version of the secretary problem (optimal stopping
problem) when he started to look for a new wife
1636 P. Fermat shows that at the extreme point the derivative
of a function vanishes. In 1657 Fermat shows that light
travels between two points in minimal time.
1http://www.maa.org/press/periodicals/convergence/kepler-the-volume-of-
a-wine-barrel-solving-the-problem-of-maxima-wine-barrel-design
M.M. Betcke Numerical Optimisation
A very short and incomplete early history cont.
Source http://www.mitrikitti.fi/opthist.html
Calculus of Variations
I. Newton (1660s) and G.W. von Leibniz (1670s) create
mathematical analysis that forms the basis of calculus of
variations (CoV). Some separate finite optimization problems
are also considered
1696 Johann and Jacob Bernoulli study Brachistochrone’s
problem, calculus of variations is born
1740 L. Euler’s publication begins the research on general
theory of calculus of variations
M.M. Betcke Numerical Optimisation
A very short and incomplete early history cont.
Source http://www.mitrikitti.fi/opthist.html
Least squares
1806 A.-M. Legendre presents the least square method, which
also J.C.F. Gauss claims to have invented. Legendre made
contributions in the field of CoV, too
Linear Programming
1826 J.B.J. Fourier formulates LP-problem for solving
problems arising in mechanics and probability theory
1939 L.V. Kantorovich presents LP-model and an algorithm
for solving it. In 1975 Kantorovich and T.C. Koopmans get
the Nobel memorial price in economics for their contributions
on LP-problem
1947 G. Dantzig, who works for US air-force, presents the
Simplex method for solving LP-problems, von Neumann
establishes the theory of duality for LP-problems
M.M. Betcke Numerical Optimisation
Example: transportation problem
2 factories, Fi
12 retail outlets, Rj
each factory Fi can produce up to ai tones of a certain
compound per week
each retail outlet Rj has a weekly demand of bj tones of the
compound
the cost of shipping of one tone of the compound from Fi to
Rj is cij
Goal: what is the optimal amount to ship from each factory to
each outlet which satisfies demand at minimal cost.
M.M. Betcke Numerical Optimisation
Example: transportation problem cont.
min
∑
ij cijxij
subject to
12∑
j=1
xij ≤ ai , i = 1, 2
2∑
i=1
xij ≥ bj , j = 1 . . . 12
xij ≥ 0, i = 1, 2, j = 1 . . . 12.
Linear programming problem because the objective function and
all constraints are linear.
M.M. Betcke Numerical Optimisation
Example: nonlinear optimisation
min f (x , y)
subject to − y + 2x −
1
2
≥ 0
y −
1
4
x2 +
1
2
≥ 0.
x
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
y
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
M.M. Betcke Numerical Optimisation
Convexity
A set S ⊂ Rn is convex if for any two points x , y ∈ S the line
segment connecting them lies entirely in S
αx + (1− α)y ∈ S, ∀α ∈ [0, 1].
Examples:
-1.5 -1 -0.5 0 0.5 1 1.5
-1.5
-1
-0.5
0
0.5
1
1.5
p = 1
p = 2
p = 3
p = Inf
50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
Figure: (a) unit ball {x ∈ Rn : ‖x‖p ≤ 1}, p ≥ 1; (b) polyheadron
{x ∈ Rn : Ax = b,Cx ≤ d}
M.M. Betcke Numerical Optimisation
Convexity
A function f is convex if
its domain S is a convex set,
for any two points x , y ∈ S,
f (αx + (1− α)y) ≤ αf (x) + (1− α)f (y), ∀α ∈ [0, 1].
A function f is strictly convex if for x 6= y
f (αx + (1− α)y) < αf (x) + (1− α)f (y), ∀α ∈ (0, 1). A function f is concave if −f is convex. Examples: linear function f (x) = cTx + α, where c ∈ Rn, α ∈ R convex quadratic function f (x) = xTHx , where H ∈ Rn×n symmetric positive (semi)definite M.M. Betcke Numerical Optimisation Classification of optimisation problems convex vs non-convex smooth vs non-smooth constrained vs unconstrained linear vs quadratic vs nonlinear small vs large scale local vs global stochastic vs deterministic discrete vs continuous M.M. Betcke Numerical Optimisation Unconstraint minimisation Let f : Rn → R be a smooth function for which we can evaluate f and its derivatives at any given point x ∈ Ω ⊆ Rn. Unconstraint optimisation problem min x∈Ω⊆Rn f (x). (1) A point x? is a global minimiser if f (x?) ≤ f (x), ∀x ∈ Ω ⊆ Rn. A point x? is a local minimiser if ∃N (x?) : f (x?) ≤ f (x), ∀x ∈ N (x?), N (y) is a neighbourhood of y (an open set which contains y). M.M. Betcke Numerical Optimisation Unconstraint minimisation A point x? is a strict (or strong) local minimiser if ∃N (x?) : f (x?) < f (x), ∀x ∈ N (x?), x 6= x?. Examples: f(x) = 2: every point is a (weak) local minimiser f (x) = (x − 2)4 : x? = 2 is a strict local minimiser (also a global one) f (x) = cos(x) : x? = π + 2kπ, k ∈ Z, are all strict local minimisers (but not strict global on R) M.M. Betcke Numerical Optimisation Unconstraint minimisation A point x? is an isolated local minimiser if ∃N (x?) : x? is the only local minimiser in N (x?). Some strict local minimisers are not isolated e.g. f (x) = x4 cos(1/x) + 2x4, f (0) = 0 has a strict local minimiser at x? = 0 but there are strict local minimisers (albeit f (xj) ≥ x4j > 0 = f (x
?)) at nearby points
xj → 0, j →∞.
However, all isolated local minimiser are strict.
M.M. Betcke Numerical Optimisation
Unconstraint minimisation
Difficulties with global minimisation:
f (x) = (cos(20πx) + 2)|x |
has a unique global minimiser x? = 0, but the algorithms usually
get trapped into one of the many local minima.
For convex functions, every local minimiser is also a global
minimiser.
-1 -0.5 0 0.5 1
0
0.5
1
1.5
2
2.5
3
-1 -0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
|x|
exp(|x|)-1
x
2
Figure: (a) f (x) = (cos(20πx) + 2)|x |; (b) convex functionsM.M. Betcke Numerical Optimisation
Taylor theorem
Let f : Rn → R be continuously differentiable. Then for p ∈ Rn we
have
f (x + p) = f (x) +∇f (x + tp)Tp
for some t ∈ (0, 1).
If moreover f is twice continuously differentiable, we also have
∇f (x + p) = ∇f (x) +
∫ 1
0
∇2f (x + tp)pdt.
and
f (x + p) = f (x) +∇f (x)Tp +
1
2
pT∇2f (x + tp)p,
fore some t ∈ (0, 1).
M.M. Betcke Numerical Optimisation
Theorem [1st order necessary condition]
Let f : Rn → R be continuously differentiable in an open
neighbourhood of a local minimiser x?, then ∇f (x?) = 0.
Proof: [by contradiction]
Suppose that ∇f (x?) 6= 0 and define p = −∇f (x?). Note that
pT∇f (x?) = −‖∇f (x?)‖22 < 0. Furthermore, as ∇f is continuous
near x?, there exists T > 0 such that
pT∇f (x? + tp) < 0, t ∈ [0,T ].
By Taylor’s theorem, for any t̄ ∈ (0,T ] we have
f (x? + t̄p) = f (x?) + t̄pT∇f (x? + tp), t ∈ (0, t̄).
Hence f (x? + t̄p) < f (x?) for all t̄ ∈ (0,T ], and we have found a
direction leading away from x? along which f decreases which is in
contradiction with x? being a local minimiser.
M.M. Betcke Numerical Optimisation
1st oder necessary condition
We call x? a stationary point if ∇f (x?) = 0.
By Theorem [1st order necessary condition] any local minimiser is
a stationary point. The converse is in general not true.
M.M. Betcke Numerical Optimisation
Theorem [2nd order necessary condition]
If x? is a local minimiser of f and ∇2f exists and is continuous in
an open neighbourhood of x?, then ∇f (x?) = 0 and ∇2f (x?) is
positive semidefinite.
Proof: [by contradiction]
By Theorem [1st order necessary condition] we have ∇f (x?) = 0.
Assume ∇2f (x?) is not positive semidefinite. Then there exists a
vector p such that pT∇2f (x?)p < 0, and because ∇2f is
continuous near x?, there exists T > 0 such that
pT∇2f (x? + tp)p < 0 for all t ∈ [0,T ].
By Taylor theorem, we have for any t̄ ∈ (0,T ] and some t ∈ (0, t̄)
f (x? + t̄p) = f (x?) + t̄pT∇f (x?) +
1
2
t̄2pTf (x? + tp)p < f (x?).
We have found a decrease direction for f away from x? which
contradicts x? being a local minimiser.
M.M. Betcke Numerical Optimisation
Theorem [2nd order sufficient condition]
Let f : Rn → R with ∇2f continuous in an open neighbourhood of
x?. If ∇f (x?) = 0 and ∇2f (x?) is positive definite, then x? is a
strict local minimiser of f .
Proof:
Because the Hessian ∇2f is continuous and positive definite at x?,
we can choose a radius r > 0 so that ∇2f (x) remains positive
definite for all x in an open ball B2(x?, r) = {y : ‖y − x?‖2 < r}.
For any nonzero vector p 6= 0, ‖p‖2 < r , x? + p ∈ B2(x?, r) and
f (x? + p) = f (x?) + pT∇f (x?) +
1
2
pT∇2f (x? + tp)p (2)
= f (x?) +
1
2
pT∇2f (x? + tp)p, (3)
for some t ∈ (0, 1).
Furthermore, x? + tp ∈ B2(x?, r) thus pT∇2f (x? + tp)p > 0 and
therefore f (x? + p) > f (x?).
M.M. Betcke Numerical Optimisation
necessary vs sufficient condition
2nd order sufficient condition guarantees a stronger statement than
the necessary conditions (strict local minimiser).
A strict local minimiser may fail to satisfy the sufficient conditions:
f (x) = x4, f ′(x) = 4×3, f ′′(x) = 12×2
x? = 0 is a strict local minimiser while f ′′(x?) = 0 thus it satisfies
the necessary but not the sufficient conditions.
M.M. Betcke Numerical Optimisation
Implications of convexity
If f : Rn → R is convex, any local minimiser x? is also a global
minimiser of f . If, f is also differentiable, then any stationary point
x? is a global minimiser.
Proof:
Suppose x? is a local but not global minimiser. Then
∃z ∈ Rn : f (z) < f (x?). For all x on a line segment joining x? and
z i.e.
L(x?, z) = {x : x = λz + (1− λ)x?, λ ∈ (0, 1]},
by convexity of f we have
f (x) ≤ λf (z) + (1− λ)f (x?) < f (x?).
For any neighbourhood N (x?) ∩ L(x?, z) 6= ∅, hence
∃x ∈ N (x?) : f (x) < f (x?) and x? is not a local mininiser.
M.M. Betcke Numerical Optimisation
Implications of convexity
Proof: cont.
For the second part, we suppose that x? is not global minimiser.
For all z chosen as before by convexity of f it follows
∇f (x?)T(z − x?) =
d
dλ
f (x? + λ(z − x?))
∣∣∣∣
λ=0
= lim
λ→0
f (x? + λ(z − x?))− f (x?)
λ
≤ lim
λ→0
λf (z) + (1− λ)f (x?)− f (x?)
λ
= f (z)− f (x?) < 0.
Hence ∇f (x?) 6= 0 and x? is not a stationary point.
M.M. Betcke Numerical Optimisation