OptimizationTheoryandPractice
Author(s) Imprint ISBN Permalink
Pages
Forst, Wilhelm; Hoffmann, Dieter Springer New York, 2010 9780387789774, 9780387789767
https://books.scholarsportal.info/en/read?id=/ ebooks/ebooks2/ springer/2011-02-17/2/9780387789774
1 to 34
Downloaded from Scholars Portal Books on 2018-08-30 Téléchargé de Scholars Portal Books sur 2018-08-30
1 Introduction
1.1 Examples of Optimization Problems Overdetermined Systems of Linear Equations Nonlinear Least Squares Problems Chebyshev Approximation
Facility Location Problems
Standard Form of Optimization Problems Penalty Methods
1.2 Historical Overview
Optimization Problems in Antiquity
First Heroic Era: Development of Calculus
Second Heroic Era: Discovery of Simplex Algorithm Leonid Khachiyan’s Breakthrough
Exercises
The general theory has naturally developed out of the study of special problems. It is therefore useful to get a first impression by looking at the ‘classic’ problems. We will have a first look at some elementary examples to get an idea of the kind of problems which will be stated more precisely and treated in more depth later on. Consequently, we will often not go into too much detail in this introductory chapter.
Section 1.2 gives a historical survey of this relatively young discipline and points out that mathematics has at all times gained significant stimulations from the study of optimization problems.
1.1 Examples of Optimization Problems
Overdetermined Systems of Linear Equations
Let m,n∈N with m>n, A∈Rm×n and b∈Rm.Weconsiderthe overdetermined linear system
Ax=b x∈Rn. (1)
W. Forst and D. Hoffmann, Optimization—Theory and Practice, 1 Springer Undergraduate Texts in Mathematics and Technology,
DOI 10.1007/978-0-387-78977-4 1, ⃝c Springer Science+Business Media, LLC 2010
Chapter 1
2 Introduction
Generally it has no solution.
However, if A is quadratic (m = n) and regular, it is well known that the
equation Ax = b has a unique solution
x = A−1b.
The overdetermined case often appears in practical applications, for instance, if we have more observations or measurements than given independent pa- rameters. In this case, it is useful to minimize the ‘residual vector’
r := r(x) := b−Ax
in some sense to be stated more precisely. In most cases we are satisfied with the minimization of ∥Ax − b∥ with respect to a given norm ∥ ∥ on Rm, and often state the problem in the shortened form
∥Ax − b∥ −→ min x
or in like manner.
So one considers, for instance, the following alternative problem for (1):
(i) ∥Ax − b∥2 −→ min x
n (ii)∥Ax−b∥∞= max
1≤μ≤m ν=1 or more general than (i)
aμνxν−bμ−→min
m n ( i i i ) 1 ≤ p < ∞ : ∥ A x − b ∥ p =
μ=1 ν=1
x
p1/p a μ ν x ν − b μ
− → m i n x
Ad (i): Linear Gauß Approximation Problem
We consider the functions fμ defined by fμ(x) :=
and μ = 1, . . . , m. Then our problem is equivalent to the linear least squares
problem
ν=1
1m 1
F(x):= fμ(x)2 = ∥Ax−b∥2−→min,
n n aμν xν −bμ for x ∈ R
2μ=1 2 x
whose objective function is differentiable. Necessary and sufficient optimality
conditions are given by the Gauss normal equations (cf. exercise 2)
AT Ax = AT b .
Ad (ii): Linear Chebyshev Approximation Problem
Chapter 1
1.1 Examples of Optimization Problems 3
This problem (initially with a nondifferentiable objective function) is a special minimax problem and therefore equivalent to the ‘linear program’
η−→ min x∈Rn,η∈R
with respect to the constraints
n
or alternatively
Example 1
ν=1
n ν=1
aμνxν −bμ ≤η
−η≤
aμνxν −bμ≤η
(μ=1,...,m)
(μ=1,...,m).
Consider the following overdetermined system of linear equations A x = b :
> restart: with(linalg): x := vector(2):
A := matrix(5,2,[1,1,1,-1,1,2,2,1,3,1]); b := vector([3,1,7,8,6]);
⎡⎤
11 ⎢1 −1⎥
A:=⎢1 2⎥, b:=[3,1,7,8,6] ⎣2 1⎦
31
The Gauß approximation problem has the following normal equations:
> B := evalm(transpose(A)&*A): c := evalm(transpose(A)&*b): geneqns(B,x,c);
{16×1 +7×2 = 45, 7×1 +8×2 = 30}
These can be solved, for example, with the Maple⃝R command linsolve :
> x := linsolve(B,c);
x:= 150,165
79 79 The command leastsqrs makes it even simpler:
> leastsqrs(A,b), evalf(%,4);
Chapter 1
4 Introduction
79 79
We get the following approximation error:
150, 165 , [1.899, 2.089]
> evalm(A&*x-b); norm(%,2); evalf(%,4);
√
78, −94, −73, −167, 141 , 1 68019, 3.302
It is also possible to solve this problem with the help of Matlab⃝R. To do that, we utilize the function lsqlin from the Optimization Toolbox and demonstrate the use of some of the options which this command offers. In the interactive modus the following four command lines have to be entered successively:
A = [1, 1, 1, 2, 3; 1,-1, 2, 1, 1]’;
b = [3, 1, 7, 8, 6]’;
options = optimset(’Display’,’iter’,’LargeScale’,’on’); [x,resnorm,residual,exitflag,output,lambda] = …
lsqlin(A,b,[],[],[],[],[],[],[],options)
The last command line does not fit into one line. We therefore continue it in the next line after having typed in “…”. Distinctly simpler but also less informative is the Matlab⃝R command x = A \ b . Since we have already dis- cussed in detail how to solve this example with Maple⃝R , we will abstain from giving the corresponding Matlab⃝R output.
If we apply the maximum norm instead of the euclidean norm, the linear Chebyshev approximation problem is equivalent to a linear optimization problem whose objective function and ‘restrictions’ can be calculated with Maple⃝R in the following way:
> ObjFunc := eta;
x := vector(2): convert(evalm(A&*x-b),list); Restr := convert(map(z->(-eta<=z,z<=eta),%),set);
ObjFunc := η
[x1 +x2 −3, x1 −x2 −1, x1 +2x2 −7, 2x1 +x2 −8, 3x1 +x2 −6]
Restr :={−η≤x1 +x2 −3≤η, −η≤x1 −x2 −1≤η,
−η≤x1 +2x2 −7≤η, −η≤2x1 +x2 −8≤η, −η≤3x1 +x2 −6≤η}
We apply the simplex algorithm which is available under Maple⃝R : > with(simplex): minimize(ObjFunc,Restr); assign(%):
79 79 79 79 79 79
5
Chapter 1
1.1 Examples of Optimization Problems 5
21 15 7 x2= 8,η= 8,×1=4
We verify that η is the approximation error:
> evalm(A&*x-b); ’eta’ = norm(%); # maximum norm!
11,−15,0,−15,15 , η=15
88888
The linear optimization problem we have just looked at can be solved with
the Matlab⃝R function linprog , for example, by writing the following sequence of commands in a Matlab⃝R script called Cheb Linprog.m :
f = [0; 0; 1]
A = [1, 1, 1, 2, 3,-1,-1,-1,-2,-3; 1,-1, 2, 1, 1,-1, 1,-2,-1,-1;
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]’ b=[3 1 7 8 6-3-1-7-8-6]’
5 disp(’Medium Scale, Simplex:’); options = …
optimset(’Display’,’iter’,’LargeScale’,’off’,’Simplex’,’on’); [x,fval,exitflag,output,lambda] = linprog(f,A,b,[],[],[],[],[],options)
This sequence of commands can then be executed in the Command Window with the command Cheb Linprog .
As already mentioned, the linear Chebychev approximation problem is a special minimax problem. These kinds of problems can be solved with the Matlab⃝R function fminimax in the following way:
x0 = [0; 0]; % Starting guess
disp(’Minimize absolute values:’);
options = optimset(’GradObj’,’on’,’Display’,’iter’, …
’MinAbsMax’,5); % Minimize absolute values 5 [x,fval,maxfval,exitflag,output] = …
fminimax(@ObjFunc,x0,[],[],[],[],[],[],[],options)
fminimax requires an m-file for the objective function and its gradient:
function [F,G] = ObjFunc(x)
A = [1 1; 1 -1; 1 2; 2 1; 3 1]; b = [3 1 7 8 6]’;
F = A*x-b;
5 if nargout >1 G = A’;
end
▹
Chapter 1
6 Introduction
Nonlinear Least Squares Problems
More generally, we consider optimization problems of the form 1 m
F(x):= fμ(x)2 −→min. 2 μ=1 x
To simplify matters, we assume that the given functions fμ: Rn −→ R (μ = 1, . . . , m) are twice differentiable on Rn. Problems of this kind appear, for example, when dealing with curve or data fitting. In addition, systems of nonlinear equations can also be reduced to this form and can then be solved by methods of nonlinear optimization.
Example 2
We wish to solve the following simple homogeneous system of nonlinear equa- tions:
x31−x2−1=0, x21−x2 =0
> restart: Digits := 5: f_1 := x_1^3-x_2-1; f_2 := x_1^2-x_2;
f1 :=x31 −x2 −1, f2 :=x21 −x2
We calculate its solution numerically using the Maple⃝R command fsolve :
> fsolve({f_1,f_2},{x_1,x_2});
{x1 = 1.4656, x2 = 2.1479}
The following figure shows that the nonlinear system has a unique solution:
> with(plots): implicitplot({f_1,f_2},x_1=-1..3,x_2=-2..4);
4
2
f1 f2
–1 12
–2
Chapter 1
5
10
1.1 Examples of Optimization Problems 7
Next, we determine the local extreme points of the objective function corre- sponding to the underlying least squares problem:
> F := 1/2*(f_1^2+f_2^2);
D1F := diff(F,x_1); D2F := diff(F,x_2); solve({D1F,D2F},{x_1,x_2}):
B := proc(z) local u;
# This Boolean function selects the real stationary points subs(z,[x_1,x_2]); u := map(evalc@Im,%);
is(u[1]=0 and u[2]=0)
end proc:
sol := allvalues([%%]): rsol := select(B,sol): ’rsol’ = evalf(rsol);
F : = 1 ( x 31 − x 2 − 1 ) 2 + 1 ( x 21 − x 2 ) 2 22
D1F :=3(x31 −x2 −1)x21 +2(x21 −x2)x1, D2F :=−x31 +2×2 +1−x21
rsol = x2 = −0.50000, x1 = 0. , x2 = −0.12963, x1 = 0.66667 ,
x1 = 1.4655, x2 = 2.1477
Obviously F has three (real) stationary points1, of which rsol[3] is the solu-
tion to the system of nonlinear equations:
> subs(rsol[3],F): ’F’ = simplify(%);
F=0 For the other two solutions we get:
> subs(rsol[1],F), subs(rsol[2],F);
1, 961 4 2916
We evaluate the Hessian to decide of which kind the stationary points are:
> with(linalg): H := hessian(F,[x_1,x_2]):
seq(subs(rsol[k],eval(H)),k=1..2),subs(evalf(rsol[3]),eval(H));
⎡65−8⎤ 1 0 , ⎢⎣ 2 7 3 ⎥⎦ , 5 0 . 1 0 1 − 9 . 3 7 4 1 02 −82 −9.3741 2
3
> definite(%[1],positive_def),definite(%[2],positive_semidef), definite(%[2],negative_semidef),definite(%[3],positive_def);
1 It is well known that these zeros of the first derivative are candidates for local extreme points.
Chapter 1
5
Levels:= contourplot(F,x_1=-1.5..2,x_2=-1.5..5,axes=box, levels=[0.1*k $ k=1..10],grid=[100,100]):
display(Points, Levels);
8
Introduction
Chapter 1
4
2
0
Chebyshev Approximation
true, false, false, true
rsol[1] and rsol[3] give local minima, and rsol[2] is a saddle point because the Hessian is indefinite. We see that not every local minimum of F is a solution to the given nonlinear system. By means of the contour lines of F we finally visualize the geometric details:
> Points := pointplot([seq(subs(evalf(rsol[k]),[x_1,x_2]),k=1..3)], symbol=circle,symbolsize=17,color=black):
01
▹
We explain the mathematical problem by means of the following simple example:
max | et − a − b t | −→ min 0≤t≤1 a, b
Consequently, we are looking for the linear polynomial function which is the best approximation of the exponential function in the interval [0, 1] with respect to the maximum norm. This problem is equivalent to
η −→ min a,b,η
subject to the constraints
−η≤et−a−bt≤η forall t∈[0,1].
1.1 Examples of Optimization Problems 9
There occur infinitely many constraints. The objective function and the con- straint functions depend only on a finite number of variables. This type of problem is called a semi-infinite (linear) optimization problem. It will not be discussed any further in later chapters.
Maple⃝R provides the command numapprox[minimax] to solve it, which we will apply as a ‘blackbox’:
> restart: with(numapprox):
p := minimax(exp,0..1,1,1,’eta’); ’eta’ = eta;
p := x → 0.8941114810 + 1.718281828 x , η = 0.105978313
> plot(exp-p,0..1,color=blue,thickness=2,tickmarks=[3,3],
title=”Error Function”);
0.1
0
–0.1
Error Function
0.2 0.4 0.6 0.8 1
The error function alternately attains its maximal deviation η at 0, at a point τ ∈ (0, 1) and at 1. Therefore, we get the following equations for a, b, τ and η: > eta := ’eta’: f := exp: p := t -> a+b*t:
eq1 := f(0)-p(0) = eta; eq2 := f(tau)-p(tau) = -eta; eq3 := f(1)-p(1) = eta; eq4 := D(f-p)(tau) = 0;
eq1:=1−a=η, eq2:=eτ−a−bτ=−η, eq3 :=e−a−b=η, eq4 :=eτ −b=0
Maple⃝R returns the following rounded values of the exact solution: > solve({eq1,eq2,eq3,eq4},{a,b,tau,eta}): evalf(%);
{η = .1059334158, b = 1.718281828, a = 0.8940665842, τ = 0.5413248543}
Comparing the results, we see that numapprox[minimax] computes the co- efficient a and the maximal deviation η only with low accuracy.
Chapter 1
10 Introduction
Facility Location Problems
This kind of problem occurs in various applications. To illustrate this, we consider two planar location problems. Here, it is very important that distance measuring and the optimality criterion are suitable for the given problem.
Example 3 Optimal Location of an Electricity Transformer
To simplify matters, assume that the electricity supply (for instance, in a thinly populated region) is provided by connecting each customer directly with the transformer. We are looking for the location of the transformer which minimizes the total net length. To measure the distance, we use the euclidean norm, and so the objective function has the form
m
(x−xμ)2+(y−yμ)2 −→min,
d(x,y):=
where the coordinates (x, y) describe the sought location of the transformer
μ=1
x,y
and (xμ, yμ) the given locations of the customers.
> restart: Digits := 5: with(plots): x := vector(2):
Locations of the customers:
> Points := [[0,0],[5,-1],[4,6],[1,3]];
>
5
>
d := add(sqrt((x[1]-z[1])^2+(x[2]-z[2])^2),z=Points): P1 := op(map(z->z[1],Points)): # abscissas of the points P2 := op(map(z->z[2],Points)): # ordinates of the points r1 := x[1]=min(P1)..max(P1);
r2 := x[2]=min(P2)..max(P2); # defines graphic window
r1 :=x1 =0..5, r2 :=x2 =−1..6
p1 := plot3d(d,r1,r2,style=patchcontour,axes=box): display(p1,orientation=[-50,25]); display(p1,orientation=[-90,0]);
Objective function:
Points := [[0, 0], [5, −1], [4, 6], [1, 3]]
Chapter 1
1.1 Examples of Optimization Problems
11
6
4
2
Chapter 1
20
15 0
6 4
2
20
40
> p2 := pointplot(Points,symbol=circle,symbolsize=18,axes=frame): p3 := contourplot(d,r1,r2,levels=30,grid=[70,70],color=gray): display(p2,p3,color=black);
In contrast to the foregoing figure, we now get a real 2D graphic. Clicking on the surmised minimum, causes Maple⃝R to return a rough approximation of the optimal location. We forego its reproduction in this book. The exact minimum can be calculated by analytical means:
> with(linalg): g := grad(d,x):
#solve(convert(g,set),{x[1],x[2]}); # symbolic calculation fails! Solution := fsolve(convert(g,set),{x[1],x[2]});
Solution := {x1 = 1.6000, x2 = 2.4000}
> subs(Solution,eval(g)); # insert computed results
[0., 0.00005]
As decimal numbers are well known to us, we detect the exact solution without
the help of solve :
> Solution := {x[1]=8/5,x[2]=12/5}:
subs(Solution,eval(g)): simplify(%); # symbolic calculation!
[0, 0]
By pure chance we have guessed the exact solution!
> H := hessian(d,x):
subs(Solution,eval(H)): simplify(%); evalf(%); definite(%,positive_def);
024
12
⎡ 75√ 25√ 25√ 25√⎤ ⎢⎣ 676 13+ 51 2 −338 13+ 51 2⎥⎦ ,
− 25 √13+ 25√2 25 √13+ 25√2 338 51 507 51
Introduction
1.0933 0.42656 , true 0.426560.87103
Chapter 1
So the Hessian is positive definite in [8/5,12/5]. As d is strictly convex, there lies the unique minimum.
> U := [8/5,12/5]:
subs({x[1]=U[1],x[2]=U[2]},d): simplify(%); evalf(%);
√√
2 13+4 2, 12.868
> p4 := plot([seq([z,U],z=Points)],color=blue,thickness=2):
p5 := plot([U],style=point,symbol=circle,symbolsize=24,color=blue): display(p2,p3,p4,p5,axes=frame);
6 5 4 3 2 1 0
–1
012345
We see that the optimal location of the electricity transformer is the intersec- tion point of the two diagonals. ▹
Example 4 Optimal Location of a Rescue Helicopter
Now, we are looking for the optimal location of a rescue helicopter. As in the foregoing case, the distance is measured in a straight line. The helicopter should reach its destination in minimal time. Therefore, we now have to min- imize the maximum distance and get the objective function
d m a x ( x , y ) : = m a x ( x − x μ ) 2 + ( y − y μ ) 2 −→ m i n . 1≤μ≤m x, y
Coordinates of the destinations:
1.1 Examples of Optimization Problems 13
> restart: with(plots): x := vector(2):
Points := [[0,0],[5,-1],[4,6],[1,3]];
r1 := x[1]=-1.5..7.5: r2 := x[2] = -2..7: # drawing window
Points := [[0, 0], [5, −1], [4, 6], [1, 3]]
Objective function:
> d_max := max(seq(sqrt((x[1]-z[1])^2+(x[2]-z[2])^2),z=Points)):
p1 := plot3d(d_max,r1,r2,style=patchcontour,axes=box,grid=[70,70]): display(p1,orientation=[-40,20]); display(p1,orientation=[-90,0]);
Chapter 1
10
5
6
4
2
0
–2
6 04
2
4
2 0
–2
6
This minimax problem can be solved more easily if we transform it (with r := dmax) to the following equivalent optimization problem:
1r2 −→ min 2 x,y,r
subject to the constraints
(x−xμ)2 +(y−yμ)2 ≤r2 (1≤μ≤m).
Substituting ρ := 1 (r2 − x2 − y2), we get a simple quadratic optimization 2
problem:
1 ( x 2 + y 2 ) + ρ −→ m i n 2 x,y,ρ
subject to the (linear) constraints
x μ x + y μ y + ρ ≥ 1 ( x 2μ + y μ2 ) ( 1 ≤ μ ≤ m ) . 2
Later on, we will get to know solution algorithms for these types of problems which provide the following coordinates of the optimal helicopter location:
0246
Chapter 1
5
14 Introduction
>
>
M := [52/17, 39/17]: # coordinates of the optimal location r := evalf(subs({x[1]=M[1],x[2]=M[2]},d_max));
r := 3.8235
p2 := pointplot(Points,symbol=circle,symbolsize=18,color=black): p3 := contourplot(d_max,r1,r2,levels=30,grid=[70,70],color=gray): p4 := plot([seq([z,M],z=Points)],color=blue):
with(plottools): p5 := disk(M,0.1,color=black):
p6 := circle(evalf(M),r,color=blue,thickness=3): display(p2,p3,p4,p5,p6,axes=frame,scaling=constrained);
6
4
2
0
–2
0246
In addition, one sees, for example with the help of the Karush–Kuhn– Tucker conditions from chapter 2 (cf. exercise 14), that the location M is the optimum place if three points lie on the boundary of the blue (minimum) circumscribed circle. They are the vertices of an acute triangle. Naturally, it is possible that only two points lie on the circle. In this case, however, they must lie diametrically opposite to each other. ▹
Standard Form of Optimization Problems
The previous examples also served to demonstrate that optimization problems from quite different areas of application have common roots. Therefore, it is useful to classify them according to mathematical criteria. In this book, we consider optimization problems which are based on the following standard form:
subject to the constraints
f(x) −→ min x∈Rn
1.1 Examples of Optimization Problems 15
gi(x)≤0 for i∈I:={1,…,m} hj(x)=0 for j∈E := {1,…,p}.
Here ‘I’ stands for ‘Inequality’, ‘E’ for ‘Equality’. As already denoted earlier, the function to be minimized is called the objective function. We write the constraints, respectively restrictions, in homogeneous form, and distinguish between inequality and equality constraints. The maximization of an objective function f is equivalent to the minimization of −f , and inequality restrictions of the type gi(x) ≥ 0 are equivalent to −gi(x) ≤ 0. Generally, we assume that the objective function as well as the restriction functions are defined on a nonempty subset of Rn and are at least continuous there.
Unconstrained optimization problems play an important role. Our discussion of them in chapter 3 will occupy a big space because they come up in various applications and are consequently important as problems in their own right. There exists quite a range of methods to solve them. Numerical methods for unconstrained optimization problems usually only compute a stationary point, that is, the first-order necessary conditions are fulfilled. Only with the help of additional investigations like the second-order optimality condition is it possible to decide whether the stationary point is a local extreme point or a saddle point. Global minima can only be stated in special cases.
Minimax problems of the form
max fμ(x) −→ min
1≤μ≤m x∈Rn
are a special kind of unconstrained optimization problem with a nondiffe- rentiable objective function even if every fμ is differentiable. These are — similar to the Chebyshev approximation — equivalent to a restricted prob- lem:
η−→ min x∈Rn,η∈R
subject to the constraints
fμ(x)≤η (μ=1,…,m).
Optimization problems with equality constraints often occur as subproblems and can be reduced via elimination to unconstrained optimization problems as we will see later on.
With restricted problems, we are especially interested in the linear constraints gi(x)=aTix−bi fori∈Iandhj(x)=aˆTjx−ˆbj forj∈E.
In chapter 4 we will discuss linear optimization — here, we have f(x) = cTx — and quadratic optimization, that is, f(x) = 1xTCx+cTx with a
n×n 2
symmetric positive semidefinite matrix C ∈ R , in detail. The above occur,
for example, as subproblems in SQP2 methods.
2 The abbreviation stands for “Sequential Quadratic Programming”.
Chapter 1
16 Introduction
Penalty Methods
We will conclude this introductory section with the presentation of the classic penalty method from the early days of numerical optimization. At that time, problems with nonlinear constraints could only be solved by reducing them to unconstrained problems by adding the constraints as ‘penalty terms’ to the objective function. Here, we will only consider the simple case of equality constraints and quadratic penalty terms. The optimization problem
f(x) −→ min x∈Rn
subject to the constraints
hj(x)=0 for j=1,…,p
is transformed into the optimization problem
λ p
2
with a positive penalty parameter λ. Enlarging λ leads to a greater ‘punish- ment’ of nonfulfilled constraints. Thereby we hope that the ‘unconstrained minimizer’ x∗(λ) will fulfill the constraints with growing accuracy and at the same time f(x∗(λ)) will better approximate the minimum we are looking for.
5
F(x) := f(x) +
hj(x)2 −→ min x∈Rn
Example 5
Consider the problem
j=1
f(x1,x2) := x1x2 −→ min x∈R2
subject to the constraint
h ( x 1 , x 2 ) : = x 21 + x 2 2 − 2 = 0 ,
which has two global minima in (x1, x2)T with x1 = −2/3 , x2 = ±2/√3 √
and a local minimum in x1 = 2, x2 = 0. At the moment, we content ourselves with the following ‘graphical inspection’:
>
restart: with(plots): f := x[1]*x[2]^2: h := x[1]^2+x[2]^2-2:
F := subs({x[1]=r*cos(t),x[2]=r*sin(t)},[x[1],x[2],f]): plot(subs(r=sqrt(2),F[3]),t=0..2*Pi,color=blue,thickness=2);
p1 := plot3d(F,t=0..2*Pi,r=0.1..sqrt(2),style=wireframe,grid=[25,25],
color=blue): r := sqrt(2):
p2 := spacecurve(F,t=0..2*Pi,color=blue,thickness=2):
p3 := spacecurve([F[1],F[2],0],t=0..2*Pi,color=black,thickness=2): display(p1,p2,p3,orientation=[-144,61],axes=box);
Chapter 1
1.1 Examples of Optimization Problems 17
Chapter 1
1 0.5
0 –0.5
Graph of the objective function
1 0
–1
123456 t
1
The figure on the right shows the constraint circle in the x1x2 -plane, along with the surface x3 = f(x1,x2). The intersection of the cylinder h(x1,x2) = 0 and the surface x3 = f(x1,x2) gives the space curve shown by the thick blue line. Later on, we will be able to analyze and prove this in more detail with the help of the optimality conditions presented in chapter 2. Let us look at the penalty function:
> F := f+lambda/2*h^2;
F :=x1x2 + 1λ(x21 +x2 −2)2 2
We get the necessary optimality conditions:
> DF1 := diff(F,x[1]) = 0; DF2 := factor(diff(F,x[2])) = 0;
DF1 :=x2 +2λ(x21 +x2 −2)x1 = 0 DF2 := 2×2 (λx2 +x1 +λx21 −2λ) = 0
If x2 = 0, the first equation is simplified to:
> subs(x[2]=0,DF1); solve(%,x[1]): Sol := map(z->[z,0],[%]);
2 √√ 2λ(x1−2)x1 =0, Sol:= [0,0],[ 2,0],[− 2,0]
In these three points, F has the following Hessians:
> with(linalg): H := hessian(F,[x[1],x[2]]): seq(subs({x[1]=z[1],x[2]=z[2]},eval(H)),z=Sol);
–1 10
0 x1
x2 –1 –1
18
Introduction
−4λ 0 8λ 0 8λ 0 0 −4λ , 0 2√2 , 0 −2√2
Obviously, the penalty function has a local minimum only in √2, 0T . For √
large λ, the corresponding Hessian has the condition number 2 2λ and, therefore, is ill-conditioned.
Now consider the remaining case in which the second factor of DF2 is equal to zero:
> DF2a := collect(op([1,3],DF2),lambda) = 0; expand(DF1-2*x[1]*DF2a): Eq[1] := x[2]^2 = solve(%,x[2]^2);
DF2a := λ(x21 +x2 −2)+x1 = 0, Eq1 := x2 = 2×21 Substituting Eq1 into DF2a leads to the following quadratic equation:
> Eq[2] := subs(Eq[1],DF2a);
Eq[3] := x[1]^2 = solve(Eq[2],x[1]^2);
Eq2 := λ(3×21 −2)+x1 = 0, Eq3 := x21 = 1 2λ−x1 3λ
Before looking at the solutions in more detail, we will prepare the Hessian for analyzing whether it is positive definite or not:
> diff(F,x[1]$2): subs(Eq[1],%): DF11 := factor(%); diff(F,x[2]$2): subs(Eq[1],%): DF22 := expand(%-2*lhs(Eq[2]));
DF11 := 2λ(5×21 −2), DF22 := 8λx21
> unprotect(Trace): Trace := expand(subs(Eq[3],DF11+DF22));
Trace := 8 λ − 6 x1
> DF11*DF22-diff(F,x[1],x[2])^2: expand(%): subs(Eq[1],%): factor(%);
Determinant := 8*x[1]^2*expand(subs(Eq[3],op(3,%)));
8×21 (6λ2 x21 −4λ2 −1−4λx1), Determinant := 8×21 (−6λx1 −1)
> Qsol := solve(Eq[2],x[1]);
1 −1+√1+24λ2 1 −1−√1+24λ2 Qsol := 6 λ , 6 λ
> subs(x[1]=Qsol[1],Determinant): ’Determinant’ = factor(%);
Determinant = −2 (−1+√1+24λ2)2√1+24λ2 9 λ2
Chapter 1
1.1 Examples of Optimization Problems 19
> Qsol[2]^2-2/3: ’x[1]^2’-2/3 = simplify(%);
x2−2 =1+√1+24λ2 13 18λ2
Looking at the determinant, it is obvious that the first of the two solutions
Chapter 1
Qsol is not possible. For x1 = Qsol 2 , we have x1 < − 2/3 . Therefore, we get two minimizers, and because of x21 + x2 = 3 x21 > 2 these points do not lie within the circle. For λ −→ ∞, however, these minimizers converge to the global minimizer of the restricted optimization problem. For large λ their numerical calculation requires very good initial values because we will see at once that the Hessians of the penalty function are ill-conditioned in the (approximating) minimizers:
> subs(x[1]=Qsol[2],mu^2-Trace*mu+Determinant): Mu := solve(%,mu): mu[max]/mu[min] = asympt(eval(Mu[1]/Mu[2]),lambda,1);
μmax √ 1 μ=6λ+Oλ
min
In the following two graphics this fact becomes clear very quickly because the elliptic level curves around the local minimizers become more elongated for increasing penalty parameters:
2
1
1.5 1
penalty parameter = 3
penalty parameter = 6
0.5 00 –0.5 –1 –1.5
–2
1 2
In comparison, we give a Matlab⃝R script Penalty.m which calculates a mini- mizer of the penalty function for the parameters λ = 1, 3, 6 and 9 by means of the function fminsearch from the Matlab⃝R Optimization Toolbox. Further- more it plots the feasible region, the level curves of the objective function and the penalty functions and marks the exact and approximating minimizers.
% Script Penalty.m: Test of Penalty Method
clc; clear all; close all; N = 200;
X = linspace(-2,2,N); Y = X; T = linspace(0,2*pi,N);
–1
–2 –1 0
–1.5 –1 –0.5 0 0.5 1 1.5
20 Introduction
Z = X.*Y.^2; [X,Y] = ndgrid(X,Y);
5 xx = sqrt(2).*cos(T); yy = sqrt(2).*sin(T);
10
15
20
25
30
35
x1 = -sqrt(2/3); y1 = sqrt(4/3);
f = @(x) x(1)*x(2)^2; h = @(x) x(1)^2+x(2)^2-2; x = [2, 1]; % starting value of fminsearch
for lambda = [1, 3, 6, 9]
Phi = Z+lambda/2*(X.^2+Y.^2-2).^2;
phi = @(x) f(x)+lambda/2*h(x)^2;
[x,fval] = fminsearch(phi,x)
% In the second iteration (lambda=3) fminsearch uses the result of % the first iteration (i.e. lambda=1) as starting value, and so on. figure, clf
subplot(2,1,1);
contour(X,Y,Z,40); colorbar;
hold on
P = plot(xx,yy,’k’); set(P,’linewidth’,2);
PP = plot([sqrt(2),x1,x1],[0,y1,-y1],’ko’); set(PP,’markersize’,6,’markerfacecolor’,’k’);
axis([-2 2 -2 2]), axis equal, axis([-2 2 -2 2]), ylabel(’y’) title([’\lambda = ’, num2str(lambda)]);
subplot(2,1,2);
contour(X,Y,Phi,160); colorbar;
hold on
P = plot(xx,yy,’k’); set(P,’linewidth’,2);
PP = plot([sqrt(2),x1,x1],[0,y1,-y1],’ko’); set(PP,’markersize’,6,’markerfacecolor’,’k’);
% Plot the minimum of phi
PP = plot(x(1),x(2),’r+’); set(PP,’markersize’,6,’linewidth’,3);
axis([-2 2 -2 2]), axis equal, axis([-2 2 -2 2])
xlabel(’x’), ylabel(’y’)
waitforbuttonpress % to wait for a click on a figure
end
▹
Penalty-based methods as presented above are at best suitable to find ap- propriate starting approximations for more efficient methods of optimization. Later on, however, we will see that a modified approach — it leads to the so-called ‘exact penalty functions’ — is of quite a practical interest.
1.2 Historical Overview
In this book we almost exclusively deal with continuous finite-dimensional op- timization problems. This historical overview, however, does not only consider the finite-dimensional, but also the infinite-dimensional case (that is, Calcu- lus of Variations and Optimal Control), because their developments have for
Chapter 1
1.2 Historical Overview 21
large parts run in parallel and the theories are almost inextricably interweaved with each other.
Optimization Problems in Antiquity
The history of optimization begins, like so many other stories, with the ‘an- cient’ Greeks. In his Aeneid Virgil recorded the legendary story of the Phoenician queen Dido who — after her flight from Tyre — allegedly landed on the coast of Numidia in North Africa in 814 BC.
With seeming modesty she asked the ruler there for as much land as could be enclosed with a bull’s hide. This bull’s hide, however, she cut into many thin strips so that she was able to enclose the whole area around the harbor bay of what was later to become the city of Carthage.
Hence “Dido’s Problem” is to find the closed curve of fixed length which encloses the maximum area. In the language of modern mathematics it is the classical isoperimetric problem.
Dido probably knew its solution, although not in today’s strict sense, since the necessary terms and concepts were only devised over 2000 years later. This was probably also the reason why only simple extremal problems of geometrical nature were examined in antiquity.
Zenodorus (200–140 BC), for example, dealt with a variant of the isoperi- metric problem and proved that of all polygons with n vertices and equal perimeter the regular polygon has the greatest area. The following optimiza- tion problem came from the famous mathematician Heron of Alexandria (second half of the first century AD). He postulated that beams of light always take the shortest path. Only in 1657 was it Pierre de Fermat (1601–1665) who formulated the correct version of this law of nature stating that beams of light cross inhomogeneous media in minimum time. Heron’s idea, however, was remarkable insofar as he was the first to formulate an extremal principle for a phenomenon of nature.
First Heroic Era: Development of Calculus
For a long time each extremal problem was solved individually using specific methods. In the 17th century people realized the necessity of developing more general methods in the form of a calculus. As the name suggests, this was the motivation for the creation of “Calculus”. The necessary condition f′(x) = 0 for extrema was devised by Fermat who discovered it in 1629 (allegedly only for polynomials). Isaac Newton (1643–1727) also knew Fermat’s method — in a more general form. He dealt with determining maxima and minima in
Chapter 1
22 Introduction
his work Methods of Series and Fluxions which had been completed in 1671 but was published only in 1736.
In 1684 Gottfried Wilhelm Leibniz (1646–1716) published a paper in the Acta Eruditorum in which he used the necessary condition f′(x) = 0 as well as the second-order derivative to distinguish between maxima and minima.
Very surprisingly the obvious generalization to functions of several variables took a long time. It was only in 1755 that the necessary optimality condition ∇f(x) = 0 was published by Leonhard Euler (1707–1783) in his book In- stitutiones Calculi Differentialis. The Lagrange multiplier rule for problems with equality constraints appeared only in 1797 in Lagrange’s Th ́eorie des fonctions analytiques. Instead, the main focus was on problems in which an integral functional, like
b a
is to be maximized or minimized in a suitable function space. The isoperi- metric problem is a problem of this kind. Further extremal problems — mainly motivated by physics — were studied by Newton, Leibniz, Jacob Bernoulli (1654–1705) and John Bernoulli (1667–1748) as well as Euler. In his book Methodus Inveniendi Lineas Curvas Maximi Minimive Proprietate Gaudentes, sive Solutio Problematis Isoperimetrici Latissimo Sensu Accepti, published in 1744, however, Euler developed first formulations of a general theory and for that used the Euler method named after him. From the neces- sary optimality conditions of the discretized problems he deduced the famous “Euler equation”
∂f = d ∂f
∂y dx ∂y′
via passage to the limit for an extremal function y. By doing this, he confirmed
known results and was able to solve numerous more general problems.
Joseph-Louis Lagrange (1736–1813) greatly simplified Euler’s method. Instead of the transition to the approximating open polygon, he embedded the extremal function y(x) in a family of functions y(x, ε) := y(x) + ε η(x) and demanded the disappearance of the “first variation”
J(y) =
f(x,y,y′)dx,
d δJ := dεJ(y+εη)
.
ε=0
Following this expression, Euler coined the term “Calculus of Variations”
for this new and important branch of Analysis.
In Euler’s works there are already first formulations for variational problems with differential equation constraints. For these kinds of problems Lagrange
Chapter 1
1.2 Historical Overview 23
devised the multiplier method named after him, of which the final version appeared in his M ́echanique analytique in 1788. It took over 100 years un- til Lagrange’s suggestion was ‘properly’ understood and until there was a mathematically ‘clean’ proof.
Calculus of Variations provided important impulse for the emergence of new branches of mathematics such as Functional Analysis and Convex Analysis. These in turn provided methods with which it was possible to also solve ex- tremal problems from economics and engineering. First papers on the mathe- matical modeling of economic problems were published at the end of the thirties by the later winners of the Nobel Prize Leonid V. Kantorovich (1912–1986) and Tjalling C. Koopmans (1910–1985). In 1951 Harold W. Kuhn (born 1925) and Albert W. Tucker (1905–1995) extended the classical Euler–Lagrange multiplier concept to finite-dimensional extremal problems with inequality constraints. Later it became known that William Karush (1917–1997) and Fritz John (1910–1994) had already achieved sim- ilar results in 1939 [Kar] and 1948 [John].
From the mid-fifties onwards there evolved the Theory of Optimal Control from the classical Calculus of Variations and other sources like dynamic pro- gramming, control and feedback control systems. Pioneering discoveries by Lev S. Pontryagin (1908–1988) and his students V. G. Boltyanskii, R. V. Gamkrelidze and E. F. Mishchenko led to the maximum prin- ciple, and thus the theory had its breakthrough as an autonomous branch of mathematics.
The development of more powerful computers contributed to the fact that it was now possible to solve problems from applications which had so far been impossible to solve numerically because of their complex structure.
The most important and most widely used numerical methods of optimization were devised after the Second World War. Before that only some individual methods were known, about which sometimes only vague information is pro- vided in the literature. The oldest method is probably the least squares fit which Carl Friedrich Gauss (1777–1855) allegedly already knew in 1794 [Gau] and which he used for astronomical and geodetic problems.
A particular success for him was the rediscovery of the planetoid Ceres in late 1801. After the discovery during January 1801 by the Italian astronomer Giuseppe Piazzi it had been observed only for a short period of time. From very few data Gauss calculated its orbit using linear regression. Franz Xaver von Zach found Ceres on the night of December 7, just where Gauss predicted it would be.
In 1805 Adrien-Marie Legendre (1752–1833) presented a printed version of the least squares method for the first time. Gauss, who published his results only in 1809, claimed priority for it.
Chapter 1
24 Introduction
An important predecessor of numerical optimization methods and prototype of a descent method with step size choice is the gradient method. It was proposed by Augustin-Louis Cauchy (1789–1857) [Cau] in 1847 to solve systems of nonlinear equations by reducing them to linear regression problems of the form
1 m
and
(JkT Jk + λkI)dk = −JkT f(x(k)) ∥dk∥2 = Δk.
F(x) =
fi(x)2 −→ min x∈Rn
2
as described in section 1.1. Here the iterations are carried out according to
x(k+1) := x(k) + λk dk
— starting from an approximation x(k) and taking the direction of steepest descent dk := −∇F(x(k)); the step size λk is the smallest positive value λ which gives a local minimum of F on the half line {x(k) + λdk | λ ≥ 0}.
However, it seems that the gradient method has not gained general acceptance —probably because the rate of convergence slows down immensely after a few iterations. In a paper from 1944 Levenberg [Lev] regarded the Gauß– Newton method as the standard method for solving nonlinear regression problems. This method is obtained by linearization
f(x) ≈ f(x(k)) + Jk(x − x(k))
in a neighborhood U(x(k),Δk), where Jk := f′(x(k)) is the Jacobian. The
solution dk to the linear regression problem ∥f(x(k)) + Jk d∥2 −→ min
d∈Rn
gives a new approximation x(k+1) := x(k) + dk. Strictly speaking this only makes sense if ∥dk∥ ≤ Δk holds. In the case of ∥dk∥ > Δk Levenberg proposed considering the restricted linear regression problem
∥f(x(k)) + Jk d∥2 −→ n min
d∈R ,∥d∥2=Δk
instead. Then there exist exactly one Lagrange multiplier λk > 0 and one correction vector dk with
i=1
One can see that for λk ≈ 0 a damped Gauß–Newton correction and for large λk a damped gradient correction is carried out. Hence, this new method ‘interpolates’ between the Gauß–Newton method and the gradient method
Chapter 1
1.2 Historical Overview 25
by combining the usually very good local convergence properties of the Gauß–Newton method with the descent properties of the gradient method. This flexibility is especially advantageous if the iteration runs through areas where the Hessian of the objective function — or its approximation — is not (yet) sufficiently positive definite.
Levenberg unfortunately does not make any suggestions for determining λk in a numerically efficient way and controlling the size of Δk . Marquardt [Mar] proposed first “ad hoc ideas” in a paper from 1963. Working indepen- dently from Levenberg, he had pursued a similar approach. This was only properly understood when the findings of Levenberg and Marquardt were studied in the context of “trust region methods” which were originally pro- posed by Goldfeld, Quandt and Trotter [GQT] in 1966. In contrast to descent methods with choice of step size one chooses the step size — or an upper bound — first, and then the descent direction.
When the first computers appeared in the middle of the 20th century, nonlinear regression problems were of central importance. Levenberg’s ap- proach, however, remained unnoticed for a long time. Rather, at first the interest centered around methods with a gradient-like descent direction. Besides Cauchy’s classical gradient method these were iterative methods which minimized with alternating coordinates and which worked cyclically like the Gauss–Seidel method or with relaxation control like the Gauss– Southwell method [Sou]. In the latter case the minimization was done with respect to the coordinate with the steepest descent.
A Quasi-Newton method proposed by Davidon (1959) turned out to be revolutionary. It determined the descent directions dk = −Hkgk from the gradient gk using symmetric positive definite matrices Hk.
Davidon’s paper on this topic remained unpublished at first. In 1963 a re- vised version of his algorithm was published by Fletcher and Powell. It is usually called the Davidon–Fletcher–Powell method (DFP method). Davidon’s original work was published only 30 years later in [Dav].
Davidon himself called his method the “variable metric method” pointing out that the one-dimensional minimization was done in the direction of the mini- mum of an approximation of the objective function. For a quadratic objective function
f(x) = bTx+ 1xTAx 2
with a symmetric positive definite Hessian A the inner product ⟨u,v⟩ := TA
v Au gives a norm ∥u∥A := ⟨u,u⟩A and thereby a metric. If one chooses dk as the steepest descent direction with respect to this norm, that means, as
the solution to
mindTgk , d̸=0 ∥d∥A
Chapter 1
26 Introduction
it holds that dk = −λA−1gk with a suitable λ > 0. If A−1 is known, the quadratic function can be minimized in one step. In the general case one works with changing positive definite matrices which are used as a tool to determine the new descent direction. Hence, the metric changes in each step — therefore the name of this method. In addition it is based on the idea of iteratively determining a good approximation of the inverse of the Hessian.
We thereby get an interesting connection with the conjugate gradient method by Hestenes and Stiefel (1952) since in the quadratic case the following holds for the descent directions:
d Ti A d j = 0 f o r i ̸ = j .
This ensures that the optimality of the former search directions remains and that the method — at least in this special case — terminates at the minimal point after a finite number of iterations.
Besides these kinds of minimization methods there were various other methods which were commonly used. These were mostly solely based on the evaluation of functions and most of them did not have a sound theoretical basis. They were plain search methods. They were easy to implement, worked sufficiently reliably and fulfilled the expectations of their users in many applications. However, with higher-dimensional problems their rate of convergence was very slow — with quickly increasing costs. Therefore, due to their lack of efficiency most of these methods have gone out of use. However, very popular even today is the Nelder and Mead polytope method from 1965 which will be presented in chapter 3.
Second Heroic Era: Discovery of Simplex Algorithm
The simplex method can be regarded as the first efficient method for solving a restricted optimization problem — here in the special case of a linear ob- jective function and linear (inequality) constraints. This algorithm, developed by George Dantzig (1914–2005) in 1947, greatly influenced the develop- ment of mathematical optimization. In problems of this kind the extrema are attained on the boundary of the feasible region — and in at least one ‘vertex’. The simplex method utilizes this fact by iteratively switching from one vertex to the next and thereby — if we are minimizing — reducing the objective function until an optimal vertex is reached.
The representation of the simplex algorithm in tableau form, which had been dominant for a long time, as well as the fact that the problem was called ‘linear program’ indicate that it has its roots in the time of manual computation and mechanical calculating machines. More adequate forms of representation like the active set method only gradually gained acceptance. This also holds for first solution strategies for quadratic optimization problems which were devised
Chapter 1
1.2 Historical Overview 27
by Beale and Wolfe in 1959. These can be regarded as modifications of the simplex algorithm.
A survey by Colville (1968) gives a very good overview of the algorithms available in the sixties to solve nonlinearly constrained optimization prob- lems. Surprisingly, the reduced gradient method by Abadie and Carpentier (1965, 1969) was among the best methods. First formulations of these kinds of projected gradient methods — at first for linear, later on also for nonlin- ear constraints — were devised by Rosen (1960/61) and Wolfe (1962). The idea to linearize the nonlinear constraints does not only appear in the reduced gradient method. The SLP method (SLP = sequential linear programming) by Griffith and Stewart (1961) solves nonlinear optimization problems by using a sequence of linear approximations — mostly using first-order Taylor series expansion. A disadvantage of this approach is that the solution to each LP-problem is attained in a vertex of the feasible region of the (linearized) constraints which is rather improbable for the nonlinear initial problem.
Wilson (1963) was the first to propose the use of quadratic subproblems — with a quadratic objective function and linear constraints. There one uses a second-order Taylor series expansion of the Lagrange function. One works with exact Hessians of the Lagrange function and uses the Lagrange mul- tipliers of the preceding iteration step as its approximate values. If dk is a solution of the quadratic subproblem, then x(k+1) := x(k) + dk gives the new approximative value; note that at first there is no ‘line search’.
Similar to the classical Newton method one can prove local quadratic conver- gence for this method; in the literature it is called the Lagrange–Newton method. Beginning with Han (1976), Powell and Fletcher Wilson’s ap- proach was taken up and developed further by numerous authors. The aim was to obtain a method with global convergence properties and — in analogy with Quasi-Newton methods — a simplified approximation of the Hessians. Furthermore, the (mostly nonfeasible) approximations x(k) were to reduce the objective function as well as their distance to the feasible region.
To measure and control the descent the so-called ‘exact penalty functions’ or ‘merit functions’ were introduced. This led to a modified step size control where the solutions of the quadratic subproblems were used as the descent directions of the penalty functions. The methods obtained in this way were called SQP methods.
Leonid Khachiyan’s Breakthrough
The development of the last 30 years has been greatly influenced by the after- math of a ‘scientific earthquake’ which was triggered in 1979 by the findings of the Russian mathematician Khachiyan (1952–2005) and in 1984 by those of the Indian-born mathematician Karmarkar. The New York Times, which
Chapter 1
28 Introduction
profiled Khachiyan’s achievement in a November 1979 article entitled “So- viet Mathematician Is Obscure No More,” called him “the mystery author of a new mathematical theorem that has rocked the world of computer analysis.”
At first it only affected linear optimization and the up to that time unchal- lenged dominance of the simplex method. This method was seriously ques- tioned for the first time ever when in 1972 Klee and Minty found examples in which the simplex algorithm ran through all vertices of the feasible re- gion. This confirmed that the ‘worst case complexity’ depended exponentially on the dimension of the problem. Afterwards people began searching for LP- algorithms with polynomial complexity.
Based on Shor’s ellipsoid method, it was Khachiyan who found the first algorithm of this kind. When we speak of the ‘ellipsoid method’ today, we usually refer to the ‘Russian algorithm’ by Khachiyan. In many applications, however, it turned out to be less efficient than the simplex method. In 1984 Karmarkar achieved the breakthrough when he announced a polynomial algorithm which he claimed to be fifty times faster than the simplex algorithm. This announcement was a bit of an exaggeration but it stimulated very fruitful research activities.
Gill, Murray, Saunders and Wright proved the equivalence between Karmarkar’s method and the classical logarithmic barrier methods, in par- ticular when applied to linear optimization. Logarithmic barrier methods are methods which — unlike the example of an exterior penalty method from section 1.1 — solve restricted problems by transforming a penalty or barrier term into a parameterized family of unconstrained optimization problems the minimizers of which lie in the interior of the feasible region. First approaches to this method date back to Frisch (1954). In the sixties Fiacco and Mc- Cormick devised from that the so-called interior-point methods. Their book [Fi/Co] contains a detailed description of classical barrier methods and is re- garded as the standard reference work. A disadvantage was that the Hessians of the barrier functions were ill-conditioned in the approximative minimiz- ers. This is usually seen as the reason for large rounding errors. Probably this flaw was the reason why people lost interest in these methods. Now, due to the reawakened interest, the special problem structure was studied again and it was shown that the rounding errors are less problematic if the imple- mentation is thorough enough. Efficient interior-point methods have in the meantime been applied to larger classes of nonlinear optimization problems and are still topics of current research.
Chapter 1
Exercises to Chapter 1 29
Exercises
1. In Gauss’s Footsteps (cf. [Hai])
A newly discovered planetoid orbiting the sun was seen at 10 different posi- tions before it disappeared from view. The Cartesian coordinates (xj , yj ), j = 1, . . . 10, of these positions represented in a fitted coordinate system in the orbit plane are given in the following chart:
xj : −1.024940, −0.949898, −0.866114, −0.773392, −0.671372, −0.559524, −0.437067, −0.302909, −0.155493, −0.007464
yj : −0.389269, −0.322894, −0.265256, −0.216557, −0.177152, −0.147582, −0.128618, −0.121353, −0.127348, −0.148885
Our aim now is to determine the orbit of this object on the basis of these observations in order to be able to predict where it will be visible again. We assume the orbit to be defined by an ellipse of the form
x2 =ay2+bxy+cx+dy+e.
This leads to an overdetermined system of linear equations with the un- known coefficients a,b,c,d and e, which is to be solved by means of the method of least squares. Do the same for the parabolic approach
x2 =dy+e. Which of the two trajectories is more likely?
2. Gauss Normal Equations
Let m,n∈N with m>n, A∈Rm×n and b∈Rm. Considerthe
mapping φ given by
φ(x) := ∥Ax − b∥2 for x ∈ Rn
and show: A vector u ∈ Rn gives the minimum of φ if and only if ATAu = ATb.
3. Overdetermined System of Linear Equations (cf. Example 1, page 3)
a) Formulate the linear ∥ ∥1-approximation problem ∥Ax − b∥1 → min
x
as an equivalent linear optimization problem similar to the Cheby- shev approximation problem.
b) Calculate a solution to the above linear optimization problem with Maple⃝R or in like manner.
Chapter 1
30 Introduction
c) Prove that the convex hull of the points (3, 2), (2, 1), ( 3 , 3 ), (1, 3) 22
4. a)
5. a)
yields solutions to the ∥ ∥1-approximation problem. Are there any other solutions?
The minimum of the function f defined by f(x) := x21+x2−x1x2−3×1 for x ∈ R2 can instantly be calculated through differentiation. Another way is to use the method of alternate directions: The function f is firstly minimized with respect to x1, then with respect to x2, then again with respect to x1 and so on. Use x(0) := (0, 0)T as the starting value and show that the sequence x(k) obtained with the method of alternate directions converges to the minimal point x∗ = (2, 1)T .
b) Let
• Visualize the function f by plotting its contour lines.
f(x)=max |x1+2×2−7|,|2×1+x2−5| .
• Minimize f similar to the previous exercise, starting with x(0) := (0, 0)T. Why don’t you get the minimizer x∗ = (1,3)T with f(x∗) = 0?
Let m ∈ N. Assume that the points P1,…,Pm and S1,…,Sm in R2 are given, and we are looking for a transformation consisting of a rotation and a translation that maps the scatterplot of (P1,…,Pm) as closely as possible to the scatterplot of (S1,…,Sm). That means: We are looking for a,b ∈ R and φ ∈ [0,2π) such that the value
is minimized. Solve the problem analytically.
b) Calculate the values of x, y, φ for the special case m = 4 with
with
d(a,b,φ) :=
m j=1
∥fa,b,φ(Pj)−Sj∥2
fa,b,φ(x, y) := (a, b) + (x cos φ − y sin φ, x sin φ + y cos φ)
Sj Pj
Sj Pj
(20, 20) (40, 40) (120, 40)
(20, 15) (33, 40) (125, 40) (20, 65)
(20, 60)
and
(20, 20) (40, 40)
(16, 16) (42, 30) (115, −23) (42, 58) .
(120, 40)
(20, 60)
6.
Little Luca has gotten 2 dollars from his grandfather. He takes the money and runs to the kiosk around the corner to buy sweets. His favorites are licorice twists (8 cents each) and jelly babies (5 cents each). Luca wants to buy as many sweets as possible. From experience, however, he knows
Chapter 1
Exercises to Chapter 1 31
that he cannot eat more than 20 jelly babies. How many pieces of each type of candy should he buy in order to get as many pieces as possible for his 2 dollars, in view of the fact that not more than 20 jelly babies should be bought (as he does not want to buy more than he can consume immediately)?
a) Formulate the above situation as an optimization problem. Define an objective function f and maximize it subject to the constraints g(s1,s2) ≤ 0. (g can be a vector function; then the inequality is un- derstood componentwise!) s1 and s2 denote the amount of licorice twists and jelly babies.
b) Visualize the feasible region F, that is, the set of all pairs (s1 , s2 ) such that g(s1,s2) ≤ 0 holds. Visualize f: F −→ N and determine its maximum.
c) Is the optimal point unique?
7. The floor plan below shows the arrangement of the offices of a company with a staff of thirty people including the head of the company and two secretaries: On the plan the number in each office denotes the number of staff members working in this room.
The company has bought a copier and now the “sixty-four-dollar question” is: Where should the copier go?
The best spot is defined as the one that keeps traffic in the hallways to a minimum. The following facts are known about the “copying customs” of the staff members:
• Each staff member (except the head and the secretaries) uses the copier equally often. All the staff members do their own copying (that is, if there are four people in one office, each of them takes his own pile to the copier).
• Both secretaries copy 5 times as much as the other staff members; the head almost never uses the copier.
• There is nobody who walks from the server room or conference room to the copier.
Since the copier cannot be placed in any of the offices, it has to be put somewhere in the hallway, where the following restrictions have to be considered:
• The copier may not be placed directly at the main entrance.
• The copier should not be placed in the middle of the hallway. Conse-
quently, one cannot put it between two opposing doors.
• The copier should not be placed directly in front of the glass wall of the conference room, so as not to disturb (customer) meetings.
Chapter 1
32
Introduction
Altogether we thus obtain four “taboo zones”. Their exact position and layout can be seen in the floorplan:
2
1
1
3
2
1
Head
1
Se
R
4
2
rver
Conference
S2
1
Room
1
1
3
1
3
1
S1
1
1
S1: Secretary’s1 office S2: Secretary’s2 office
Door
Taboo Zones Glass Wall
2m
oom
8.
Hint: In practice it is almost impossible to solve this problem without any simplifications. In this case it holds that:
• It does not matter on which side of the hallway the copier is placed. Therefore the hallways may be treated as one-dimensional objects.
• The distance ‘desk–copier’ is measured from the door of the respective office and not from the actual desk in the office.
• You might need additional simplifications.
Implement the above problem and determine the optimal location for the copier. What would be the optimal solution if there were no “taboo zones”?
Everybody knows that in an advertisement a product can always do every- thing. Mathematical software is no exception to that. We want to inves- tigate such promises in practice. The problem comes from the area of electric power supply:
A power plant has to provide power to meet the (estimated) power demand given in the following chart.
Main Entrance
Chapter 1
Exercises to Chapter 1
33
Expected Demand of Power
12pm – 6am
6am – 9am
9am – 3pm
3pm –6pm
6pm – 12pm
15 GW
30 GW
25 GW
40 GW
27 GW
There are three types of generators available, 10 type 1, 10 type 2 and 12 type 3 generators. Each generator type has a minimal and maximal capacity; the production has to be somewhere in between (or else the generator has to be shut off). The running of a generator with minimal capacity costs a certain amount of money (dollars per hour). With each unit above minimal capacity there arise additional costs (dollars per hour) (cf. chart). Costs also arise every time a generator is switched on.
Technical information and costs for the different generator types
mi, Mi : minimal and maximal capacity
ei : costs per hour (minimal capacity)
ci : costs per hour and per megawatt above minimal capacity fi : costs for switching on the generator
In addition to meeting the (estimated) power demands given in the chart an immediate increase by 15% must always be possible. This must be achieved without switching on any additional generators or exceeding the maximal capacity.
Let nij be the number of generators of type i which are in use in the j-th part of the day, i = 1,2,3 and j = 1,2,…,5, and sij the number of generators that are switched on at the beginning of the j-th part of the day.
The total power supply of type i generators in the j-th part of the day is denoted by xij.
mi
Mi
ei
ci
fi
Typ 1 Typ 2 Typ 3
850 MW 1250 MW 1500 MW
4000 MW 1750 MW 4000 MW
2000
5200
6000
4 2.6 6
4000
2000
1000
The costs can be described by the following function K:
3 5 K =
i=1 j=1
cizj(xij − minij) +
3 5 i=1 j=1
eizjnij +
3 5 i=1 j=1
fisij
xij are nonnegative real numbers, nij and sij are nonnegative integers. zj denotes the number of hours in the j-th part of the day (which can be obtained from the above chart).
Chapter 1
34
Introduction
9.
c) Determine nij, sij and xij such that the total costs are as low as
possible!
You might not find the global minimum but only a useful suggestion. You can try a software of your choice (for example the Matlab⃝R Optimization Toolbox or Maple⃝R ). Your solution has to meet all the constraints.
Portfolio optimization (cf. [Bar1], p. 1ff)
We have a sum of money to split between three investment possibilities which offer rates of return r1 , r2 and r3 . If x1 , x2 and x3 represent the portions of total investment, we expect an overall return
R = r1x1 +r2x2 +r3x3 .
If the management charge associated with the j-th possibility is cj xj , then the total cost of our investment is c1 x1 + c2 x2 + c3 x3 . For an aimed return R we want to pay the least charges to achieve this. Then we have to solve the problem:
c1x1+c2x2+c3x3 −→min x1+x2+x3 =1
r1x1 +r2x2 +r3x3 = R x1,x2,x3 ≥0
Consider the corresponding quadratic penalty problem Φλ(x) := f(x) + λP(x)
forx=(x1,x2,x3)T ∈R3,h1(x):=x1+x2+x3−1,h2(x):=r1x1+ 2 2 3 2
a)
b)
Which simplifications are “hidden” in the cost function? Formulate the constraints!
r2x2 +r3x3 −R and P(x) := h1(x) +h2(x) +
0, if u≥0
ψ(xj) with ψ(u) := max(−u,0) = −u, if u<0
where negative investments xj are penalized by a high management charge λψ(xj)2. Use the values R = 1.25 together with
c1 =10,c2 =9,c3 =14 and r1 =1.2,r2 =1.1,r3 =1.4.
Calculate the minimum of Φλ for the parameter values λ = 103, 106, 109 for instance by means of the function fminsearch from the Matlab⃝R Opti- mization Toolbox (cf. the m-file Penalty.m at the end of section 1.1).
How does the solution change in the cases where c3 = 12 and c3 = 11?
j=1
Chapter 1