Unconstrained Optimization
I Optimization problem
Given f : Rn −→ R
find x∗ ∈ Rn, such that x∗ = argmin
x
f(x)
I Global minimum and local minimum
I Optimality
Necessary condition: ∇f(x∗) = 0
Sufficient condition: Hf (x∗) = ∇2f(x∗) is positive definite
Newton’s method
I Taylor series approximation of f at k-th iterate xk:
f(x) ≈ f(xk) +∇f(xk)T (x− xk) +
1
2
(x− xk)THf (xk)(x− xk)
I Differentiating with respect to x and setting the result equal to zero
yields the k + 1-th iterate, namely Newton’s method:
xk+1 = xk − [Hf (xk)]−1∇f(xk).
I Newton’s method converges quadratically when x0 is near a minimum.
Gradient descent optimization
I Directional derivative of f at x in the direction u:
Duf(x) = lim
n→∞
1
h
[f(x+ hu)− f(x)] = uT∇f(x)
It measures the change in the value of f relative to the change in the
variable in the direction of u.
I To min f(x), we would like to find the direction in which f decreases
the fastest. Using the directional derivative,
min
u
uT∇f(x) = min
u
‖u‖2‖∇f(x)‖2 cos θ
= −‖∇f(x)‖22
when
u = −∇f(x).
I u = −∇f(x) is call the steepest descent direction.
Gradient descent optimization
I The steepest descent algorithm:
xk+1 = xk − τ · ∇f(xk),
where τ is called stepsize or “learning rate”, which can be chosen as
follows:
1. min
τ
f(xk − τ · ∇f(xk)) , or
2. τ = small const. or
3. evaluate f(x− τ∇f(x)) for several different values of τ and choose
the one that results in the smallest objective function value.
Solving LS by gradient-descent
I Let A ∈ Rm×n and b = (bi) ∈ Rm
I The least squares problem
min
x
f(x) = min
x
1
2
‖Ax− b‖22
= min
x
1
2
m∑
i=1
f2i (x)
where
fi(x) = A(i, 🙂
Tx− bi
I Gradient: ∇f(x) = ATAx−AT b
I The method of gradient descent:
I set the stepsize τ and tolerance δ to small positive numbers.
I while ‖ATAx−AT b‖2 > δ do
x← x− τ · (ATAx−AT b)
Solving LS by gradient-descent
MATLAB demo code: lsbygd.m
>> …
>> r = A’*(A*x – b);
>> xp = x – tau*r;
>> res(k) = norm(r);
>> if res(k) <= tol, ... end >> …
>> x = xp;
>> …
Connection with root finding
Solving nonlinear system of equations:
f1(x1, x2, . . . , xn) = 0
f2(x1, x2, . . . , xn) = 0
…
fn(x1, x2, . . . , xn) = 0
is equivalent to solve the optimization problem
min
x
g(x) = g(x1, x2, . . . , xn) =
n∑
i=1
(fi(x1, x2, . . . , xn))
2