程序代写代做代考 algorithm Numerical Optimisation: Trust region methods

Numerical Optimisation: Trust region methods

Numerical Optimisation:
Trust region methods

Marta M. Betcke
m.betcke@ucl.ac.uk,

Kiko Rullan
f.rullan@cs.ucl.ac.uk

Department of Computer Science,
Centre for Medical Image Computing,

Centre for Inverse Problems
University College London

Lecture 4

M.M. Betcke Numerical Optimisation

Trust region: idea

Choose a region around the current iterate f (xk) in which we
trust a model.

Choose a relatively easy solvable model which we trust is an
adequate representation of f in this region.

Compute the direction and step length which minimise the
model in the trust region.

The size of the trust region is critical to effectiveness. If the
region is too small, the algorithm will make little progress. If
the region is too large, the minimiser of the model can be far
away from the minimiser of f .

If the model is consistently reliable, the trust region may be
increased.

If the step length is not acceptable, reduce the size of the trust
region and find a new minimiser. In general both the direction
and step length change when the trust region changes.

M.M. Betcke Numerical Optimisation

Trustregion: model

Here we assume a quadratic model based on Taylor expansion of f
at xk

mk(p) = f (xk) + g
T
k p +

1

2
pTBkp,

where g = ∇f (xk), and Bk is a symmetric approximation to the
Hessian ∇2f (xk).
In general we only assume symmetry and uniform boundedness for
Bk . The difference between ∇2f (xk + tp), t ∈ (0, 1) and mk(p) is
O(‖p‖2).

The choice of Bk = ∇2f (xk) leads to trust region Newton
methods and the model accuracy is O(‖p‖3).

M.M. Betcke Numerical Optimisation

In each step we solve

min
p∈Rn

mk(p) = f (xk) + g
T
k p +

1

2
pTBkp, s.t. ‖p‖ ≤ ∆k , (CM)

and ∆k > 0 is the radius of the trust region. The constraint can
be equivalently written pTp ≤ ∆2k .

If Bk is positive definite the minimum of the unconstrained
quadratic model problem mk is p

B = −B−1k gk . If
‖pB‖ = ‖B−1k gk‖ ≤ ∆k this is also the solution to the constrained
problem and we call pB a full step.

Solution in other cases is less straight forward but can usually be
obtained at moderate computational cost. In particular, only
approximate solution is necessary to obtain convergence and good
practical behaviour.

M.M. Betcke Numerical Optimisation

Trust region vs line search

Figure: Nocedal Wright Fig 4.1

M.M. Betcke Numerical Optimisation

Choice of trust region radius ∆k

Compare the actual reduction in objective function to the predicted
reduction i.e. reduction in the model mk .

ρk =
f (xk)− f (xk + p)
mk(0)−mk(p)

ρk < 0 : f (xk) < f (xk + p) – reject step, shrink trust region and try again ρk > 0, small – accept step, shrink trust region for next
iteration

ρk > 0, but significantly smaller than 1 – accept the step and
do not alter trust region

ρ ≈ 1 : good agreement between f and mk – accept step and
expand trust region for next iteration.

M.M. Betcke Numerical Optimisation

Algorithm: Trust region

1: Given ∆̂ > 0,∆0 ∈ (0, ∆̂) and η ∈ [0, 14 )
2: for k = 1, 2, 3, . . . do
3: Obtain pk by (approximatively) solving (CM)

4: Evaluate ρk =
f (xk )−f (xk+p)
mk (0)−mk (p)

5: if ρk < 1 4 then 6: ∆k+1 = 1 4 ∆k 7: else 8: if ρk >

3
4

and ‖pk‖ = ∆k then
9: ∆k+1 = min(2∆k , ∆̂)

10: else
11: ∆k+1 = ∆k
12: end if
13: end if
14: if ρk > η then
15: xk+1 = xk + pk
16: else
17: xk+1 = xk
18: end if
19: end for

M.M. Betcke Numerical Optimisation

Theorem [More, Sorensen]

p? is a global solution of the trust region problem (CM)

min
p∈Rn

mk(p) = f (xk) + g
T
k p +

1

2
pTBkp, s.t. ‖p‖ ≤ ∆k

if and only if p? is feasible and there is a scalar λ ≥ 0 such that
the following conditions are satisfied:

(B + λI )p? = −gk , (1a)
λ(∆− ‖p?‖) = 0, (1b)

B + λI is positive semidefinite. (1c)

M.M. Betcke Numerical Optimisation

Solution of (CM) for different radii

Figure: Nocedal Wright Fig 4.2 (note that p?3 and p
?
1 should be swapped)

M.M. Betcke Numerical Optimisation

Solution of (CM) for different radii

For ∆1, ‖p?‖ < ∆ hence λ = 0 and so Bp? = −gk with B positive semidefinite from (1)(a,c). For ∆2,∆3 the solution lies on the boundary of the respective trust region, hence ‖p?‖ = ∆ and λ ≥ 0. From (1)(a) we have λp? = −Bp? − gk = −∇mk(p?). Thus if λ > 0, p? is collinear with the negative gradient of mk and
normal to its contours.

M.M. Betcke Numerical Optimisation

Cauchy point

Cauchy point pC is the minimiser of mk along the steepest descent
direction −gk subject to the trust region bound.

Find ps :

ps = arg min
p∈Rn

f (xk) + g
T
k p, s.t. ‖p‖ ≤ ∆k

Calculate the scalar τk > 0:

τk = arg min
τ≥0

mk(τp
s) s.t. ‖τps‖ ≤ ∆k .

Set pC = τkp
s .

The solution to the first problem can be written down explicitly,
simply by going as far as allowed in the steepest descent direction

ps = −
∆k
‖g‖

g .

M.M. Betcke Numerical Optimisation

To obtain τk we substitute p
s = − ∆k‖gk‖gk into the second problem

we obtain

arg min
τ

mk(τp
s) = f (xk)− τ

∆k
‖gk‖

gTk gk︸ ︷︷ ︸
≥0

+
1

2
τ2

∆2k
‖gk‖2

gTk Bkgk

subject to ‖τgk ∆k‖gk‖‖ ≤ ∆k ⇔ τ ∈ [−1, 1].

We consider two cases:

gTk Bkgk ≤ 0: mk(τp
s) decreases monotonically whenever

gk 6= 0. Hence, the minimum is attained for largest
τ ∈ [−1, 1] i.e. τ = 1.
gTk Bkgk > 0: mk(τp

s) is strictly convex quadratic function in
τ , thus the minimum is either the unconstraint minimiser
whenever in [−1, 1] or otherwise 1 (arg minτ={−1,1}mk(τps))

τk =

{
1 gTk Bkgk ≤ 0

min
(
‖gk‖3/(∆kgTk Bkgk), 1

)
gTk Bkgk > 0.

M.M. Betcke Numerical Optimisation

Cauchy point for positive definite Bk

Sufficient reduction in the model is reduction of at least a
positive fraction of that achieved by the Cauchy point pC .

Figure: Nocedal Wright Fig 4.3

M.M. Betcke Numerical Optimisation

Improvement on Cauchy point

Cauchy points pC provides sufficient reduction to yield global
convergence.
Cauchy points is cheap to compute.
Cauchy point essentially corresponds to the steepest descent
method with a particular choice of step length. Steepest
descent performance can be very poor even with optimal step
length.
In Cauchy point the information in B is only used to compute
the step length. Superlinear convergence can only be expected
when B is used to compute both the descend direction and
the step length.
A number of trust region methods compute the Cauchy points
and then attempt to improve on it. Often, the full step
i.e. pB = −B−1gk is chosen whenever B is positive definite
and ‖pB‖ ≤ ∆k . When B = ∇2f (xk) or a quasi-Newton
approximation, this strategy can be expected to yield
superlinear convergence.

M.M. Betcke Numerical Optimisation

The dogleg method

Assumption: B positive definite.

If pB = −B−1gk with ‖pB‖ ≤ ∆k it is just the unconstrained
minimum

p? = pB , ‖pB‖ ≤ ∆k

On the other hand if ∆ is small w.r.t. ‖pB‖ the restriction to
‖pB‖ ≤ ∆ ensures that the quadratic term in mk has little effect
on the solution of (CM) and it could be omitted i.e.

p? ≈ −
∆k
‖gk‖

gk , ∆k � ‖pB‖.

For intermediate values of ∆k , the solution p
?(∆k) typically

follows a curved trajectory (Fig. 4.4 Nocedal, Wright).

M.M. Betcke Numerical Optimisation

Figure: Nocedal Wright Fig 4.4

M.M. Betcke Numerical Optimisation

The dogleg method replaces the curved trajectory with path
consisting of two line segments.

The first line segment runs from the origin to the minimiser of mk
along the steepest descent direction

pU = −
gTk gk

gTk Bgk
gk .

The second line segment runs from pU to pB (the unconstraint
minimum or full step).

Formally, the trajectory can be written as

p̃(τ) =

{
τpU , 0 ≤ τ ≤ 1

pU + (τ − 1)(pB − pU), 1 ≤ τ ≤ 2.

M.M. Betcke Numerical Optimisation

The dogleg method chooses p to minimise the model mk along
this path subject the trust region bound.

The minimum along the dogleg can be found easily because
(i) ‖p̃(τ)‖ is an increasing function function of τ
(ii) m(p̃(τ)) is a decreasing function of τ

Proof: For τ ∈ [0, 1] it follows from definition of pU . For τ ∈ [1, 2]
can be shown computing the derivative and showing that it is
nonnegative (i), nonpositive (ii).
Intuition:
(i) The length of p̃ could only decrease with τ if p̃(τ) turns back
at τ = 1 i.e. the vector pB − pU makes an angle larger than π/2
with pU which is not possible for the steepest descent solution.
(ii) m(p̃(2)) is the minimum of a strictly convex function, hence
m(p̃(1)) > m(p̃(2)) and the function decreases for τ ∈ [1, 2].

M.M. Betcke Numerical Optimisation

As a consequence the path p̃(τ) intersects the trust region
boundary at exactly one point if ‖pB‖ ≥ ∆ and the intersection
point can be computed solving the quadratic equation

‖pU + (τ − 1)(pB − pU)‖2 = ∆2.

In case the exact Hessian ∇2f (xk) is available, if it is positive
definite, we set B = ∇2f (xk) and the resulting procedure is a
Newton dogleg method. If ∇2f (xk) is not positive definite, we
could use one of the modified Hessians and close to the solution
we will recover the Newton step. However, the somewhat arbitrary
perturbation introduced by the modification can interfere with the
benefits of the trust region methods. In fact, the trust region
introduces its own modification (1)(a,c) thus the dogleg method is
most appropriate when B is positive definite.

M.M. Betcke Numerical Optimisation

2D subspace minimisation

An extension of the dogleg method

min
p

mk(p) = f (xk) + g
T
k p +

1

2
pTBp s.t. p ∈ span[g ,B−1g ].

The obtained minimiser is an improvement on the dogleg solution
as p̃ ∈ span[g ,B−1g ]. Furthermore, the reduction in the model mk
is ofter close to that achieved solving the full problem (CM) (not
on the subspace). The subspace span[g ,B−1g ] is a good one for
looking for a minimiser for a quadratic model (Taylor theorem).

This subspace minimisation strategy can be modified for indefinite
B.

M.M. Betcke Numerical Optimisation

Cauchy point reduction of the model mk

The Cauchy point pC satisfies the sufficient reduction condition

mk(0)−mk(p) ≥ c1‖gk‖min
(

∆k ,
‖gk‖
‖Bk‖

)
(SR)

with c1 =
1
2

.

Proof: Use the definition of the Cauchy point pC and check the
inequality case by case.

If a vector p with ‖p‖ ≤ ∆k satisfies
mk(0)−mk(p) ≥ c2(mk(0)−mk(pC ))

then it satisfies (SR) with c1 = c2/2

mk(0)−mk(p) ≥ c2(mk(0)−mk(pC )) ≥ 12c2‖gk‖min
(

∆k ,
‖gk‖
‖Bk‖

)
.

In particular, if p is the exact solution p? of (CM), then it satisfies
(SR) with c1 =

1
2

. Note that both the dogleg and 2d-subspace
minimisation algorithms satisfy (SR) with c1 =

1
2

because the both
produce approximate solutions p for which mk(p) ≤ mk(pC ).

M.M. Betcke Numerical Optimisation

Global convergence

Let ‖Bk‖ ≤ β for some constant β > 0 and f be bounded below on
the level set S = {x : f (x) ≤ f (x0)} and Lipschitz continuously
differentiable in the neighbourhood of S, N (S ,R0),R0 > 0 and all
the approximate solutions pk of (CM) satisfy the inequalities (SR)
for some c1 > 0 and ‖pk‖ ≤ γ∆k , γ ≥ 1 (slight relaxation of trust
region). We then have for

η = 0 in Algorithm:Trust region

lim inf
k→∞

‖gk‖ = 0.

η ∈ (0, 1
4

) in Algorithm:Trust region

lim
k→∞

gk = 0.

M.M. Betcke Numerical Optimisation

Superlinear local convergence

Let f be twice Lipschitz continuously differentiable in the
neighbourhood of a point x? at which the second order sufficient
conditions are satisfied. Suppose that the sequence {xk} converges
to x? and that for all k sufficiently large, the trust region algorithm
based on (CM) with Bk = ∇2f (xk) chooses steps pk that satisfy
the Cauchy point based sufficient reduction criteria (SR) and are
asymptotically similar to Newton steps pNk whenever ‖p

N
k ‖ ≤

1
2

∆k
i.e.

‖pk − pNk ‖ = o(‖p
N
k ‖).

Then the trust region bound ∆k becomes inactive for all k
sufficiently large and the sequence {xk} converges superlinearly to
x?.

M.M. Betcke Numerical Optimisation