Lecture Notes to Accompany
Scientific Computing
An Introductory Survey
Second Edition
by Michael T. Heath
Chapter 5
Nonlinear Equations
Copyright ⃝c 2001. Reproduction permitted only for noncommercial, educational use in conjunction with the book.
1
Nonlinear Equations
Given function f, we seek value x for which f(x) = 0
Solution x is root of equation, or zero of func- tion f
Problem known as root finding or zero finding
Two important cases:
1. Single nonlinear equation in one unknown, f:R → R
Solution is scalar x for which f(x) = 0
2. System of n coupled nonlinear equations in
n unknowns,
f:Rn → Rn
Solution is vector x for which all components of f are zero simultaneously
2
Examples: Nonlinear Equations
Example of nonlinear equation in one dimen- sion:
x2 − 4sin(x) = 0,
for which x = 1.9 is one approximate solution
Example of system of nonlinear equations in two dimensions:
x21−x2+0.25 = 0, −x1 +x2 +0.25 = 0,
for which x = [ 0.5 0.5 ]T is solution vector
3
Existence and Uniqueness
Existence and uniqueness of solutions more com- plicated for nonlinear equations than for linear equations
For function f : R → R, bracket is interval [a, b] for which sign of f differs at endpoints
If f continuous and sign(f (a)) ̸= sign(f (b)), then Intermediate Value Theorem implies there is x∗ ∈ [a,b] such that f(x∗) = 0
No simple analog for n dimensions
4
Examples
Nonlinear equations can have any number of solutions:
• exp(x) + 1 = 0 has no solution
• exp(−x) − x = 0 has one solution
• x2 − 4 sin(x) = 0 has two solutions
• x3 +6×2 +11x−6 = 0 has three solutions • sin(x) = 0 has infinitely many solutions
5
.
.
.
.
. .
. . . .
. .
. .
. .
γ = −0.5
. .
γ = −1.0
.
. .
. .
. .
. .
. .
. . . .
. .
.. .. ..
. . .
. . . . . .
. . .
. . .
. . . . . . . . . . . .
. . .
. . .
.
. .
. .
. . .
..
.
. . .
. . . .
.
. . . . . . .. . .. . .. . .. . . . . .
. . . .
. . .
.
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .. .
. .
.
.. .
. .
. . . .
Example
x 21 − x 2 + γ = 0 −x1+x2+γ = 0
.
. .
.
. .
. .
. .
. .
. .
. .. ..
. . . .
. . .
. . .
. .
γ = 0.5
. .
γ = 0.25 .
. .
. . .
. . .
. . . .
.
. . .
. .
. .
. . .
. . .. .
. . . .
.. . . . .
. . .
..
. . . . . . . . . .
. . .
. . .
6
Multiple Root
Nonlinear equation may have multiple root, where both function and derivative are zero, i.e., f(x) = 0 and f′(x) = 0
Examples: x2−2x+1 = 0
and
x3−3×2+3x−1 = 0 .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
. . .
. .
.
7
Sensitivity and Conditioning
Conditioning of root finding problem opposite to that for evaluating function
Absolute condition number of root finding prob- lem for root x∗ of f: R → R is 1/|f′(x∗)|
Root ill-conditioned if tangent line nearly hor- izontal
In particular, multiple root ill-conditioned Absolute condition number of root finding prob-
lem for root x∗ of f:Rn → Rn is ∥J−1(x∗)∥, f
where Jf is Jacobian matrix of f, {Jf (x)}ij = ∂fi(x)/∂xj
Root ill-conditioned if Jacobian matrix nearly singular
8
. . .
Sensitivity and Conditioning
.. …… ……….
. . . . . .
. . . . . . . . .
. . . . . . . . .
…… .
. . .
. … …….
. . . . . . . . .
….. ………. …..
.
ill-conditioned
. . . . . .
. . . . . .
. . . well-conditioned
. . . . . .
9
Sensitivity and Conditioning
What do we mean by approximate solution xˆ to nonlinear system,
∥f(xˆ)∥ ≈ 0 or ∥xˆ−x∗∥ ≈ 0?
First corresponds to “small residual,” second measures closeness to (usually unknown) true solution x∗
Solution criteria not necessarily “small” simul- taneously
Small residual implies accurate solution only if problem well-conditioned
10
Convergence Rate
For general iterative methods, define error at iteration k by
ek = xk − x∗,
where xk is approximate solution and x∗ is true
solution
For methods that maintain interval known to contain solution, rather than specific approxi- mate value for solution, take error to be length of interval containing solution
Sequence converges with rate r if lim ∥ek+1∥ = C
k→∞ ∥ek∥r
for finite nonzero constant C
11
Convergence Rate, continued
Some particular cases of interest r=1: linear (C<1)
r > 1: superlinear
r = 2: quadratic
Convergence Digits gained
rate
linear superlinear quadratic
per iteration constant increasing double
12
Interval Bisection Method
Bisection method begins with initial bracket and repeatedly halves its length until solution has been isolated as accurately as desired
while ((b − a) > tol) do
m = a+(b−a)/2
if sign(f(a)) = sign(f(m)) then
a = m elseb = m
end end
.
.
a
m. .
b
.
13
Example: Bisection Method
f (x) = x2 − 4 sin(x) = 0
a f(a) b f(b) 1.000000 -2.365884 3.000000 8.435520 1.000000 -2.365884 2.000000 0.362810 1.500000 -1.739980 2.000000 0.362810 1.750000 -0.873444 2.000000 0.362810 1.875000 -0.300718 2.000000 0.362810 1.875000 -0.300718 1.937500 0.019849 1.906250 -0.143255 1.937500 0.019849 1.921875 -0.062406 1.937500 0.019849 1.929688 -0.021454 1.937500 0.019849 1.933594 -0.000846 1.937500 0.019849 1.933594 -0.000846 1.935547 0.009491 1.933594 -0.000846 1.934570 0.004320 1.933594 -0.000846 1.934082 0.001736 1.933594 -0.000846 1.933838 0.000445 1.933716 -0.000201 1.933838 0.000445
14
Bisection Method, continued
Bisection method makes no use of magnitudes of function values, only their signs
Bisection certain to converge, but does so slowly
At each iteration, length of interval containing solution reduced by half, so method linearly convergent, with r = 1 and C = 0.5
One bit of accuracy gained in approximate so- lution for each iteration of bisection
Given starting interval [a,b], length of interval after k iterations is (b−a)/2k, so achieving error tolerance of tol requires
log2 b − a tol
iterations, regardless of function f involved 15
Fixed-Point Problems
Fixed point of given function g: R → R is value x such that
x = g(x)
Many iterative methods for solving nonlinear
equations use iteration scheme of form xk+1 = g(xk),
where g is function whose fixed points are so- lutions for f(x) = 0
Scheme called fixed-point iteration or func- tional iteration, since function g applied re- peatedly to initial starting value x0
For given equation f(x) = 0, may be many equivalent fixed-point problems x = g(x) with different choices for g
16
Example: Fixed-Point Problems
If
f(x) = x2 −x−2,
then fixed points of each of functions
g(x) = x2 − 2 g(x) = x+2 g(x) = 1+2/x
g(x) = x2 + 2 2x − 1
are solutions to equation f(x) = 0
17
Example: Fixed-Point Problems
. . . .
. . . .
. . .
3
2
1
0
0123
. . . .
. . . .
. . .
.. .. .. ..
. . . .
. . .
.. ..
.
. .
.. ..
. .
. .
. .
. .
.
. .
.
.
x2+2 2x−1
√
y = x . .y. = x2 − y2 = 1 + 2/x
y =
. . . .
. .
. .
. .
. .
. . .
. . . .
. . . . . .. . . . .
. .
. . .. ..
. . .. . . . . .. . .
. . .. . . . … . . . .
. .
. .
y =
x + 2
. . . .
. . .
.
18
Example: Fixed-Point Iteration
3
2
1
. ..
y = x
− 2
. y = x
..
. . .
0
0123
2
. . . . .
19
Example: Fixed-Point Iteration
. y = x .
.y.=.√.x. + 2 .
0
0123
3
2
1
. .
. .
20
Example: Fixed-Point Iteration
3
2
1
.
y = 1 + 2/x
. . .
. ..
.. . . .. . .
.. . . . . . .
.
. . .
.
. . .
. . . . . . . . . . . . .
. .. . . . . . . .. .. . .. .. . . …. . . . . . . . . . . .. . . . … . . . .. .. . . . . . . . .
.
. .. . .
. . . . .
. . . . .. . . . . . . .. .
. ..
.
. . y = x .
0
0123
21
Example: Fixed-Point Iteration
. . . .
.
.
3
2
1
0
0123
.
.
..
..
.. 2 ..
. . . . . . . . . . . .
.. y = x +2
. . . .
. . 2x−1
. . .
. .
. .
. . . . . .
. . . . . . . . . .
. . . . .
. . . .. .
. . . . . . . . . . . . ….
. . .
. . . .
. y = x .
22
Convergence of Fixed-Point Iteration
If x∗ = g(x∗) and |g′(x∗)| < 1, then there is interval containing x∗ such that iteration
xk+1 = g(xk)
converges to x∗ if started within that interval
If |g′(x∗)| > 1, then iterative scheme diverges Asymptotic convergence rate of fixed-point it-
eration usually linear, with constant C = |g′(x∗)|
But if g′(x∗) = 0, then convergence rate at least quadratic
23
Newton’s Method
Truncated Taylor series
f(x + h) ≈ f(x) + f′(x)h
is linear function of h approximating f near x Replace nonlinear function f by this linear func-
tion, whose zero is h = −f(x)/f′(x)
Zeros of original function and linear approxi- mation not identical, so repeat process, giving Newton’s method
xk+1 = xk − f(xk)/f′(xk)
24
Newton’s Method, continued
Newton’s method approximates nonlinear func- tion f near xk by tangent line at f(xk)
.
. .↑ xk . xk+1
25
Example: Newton’s Method
Use Newton’s method to find root of f (x) = x2 − 4 sin(x) = 0
Derivative is
f ′(x) = 2x − 4 cos(x), so iteration scheme is
xk+1=xk− x2k−4sin(xk) 2xk − 4 cos(xk)
Taking x0 = 3 as starting value, we obtain
x f(x) f′(x) h 3.000000 8.435520 9.959970 -0.846942 2.153058 1.294772 6.505771 -0.199019 1.954039 0.108438 5.403795 -0.020067 1.933972 0.001152 5.288919 -0.000218 1.933754 0.000000 5.287670 0.000000
26
Convergence of Newton’s Method
Newton’s method transforms nonlinear equa- tion f(x) = 0 into fixed-point problem x = g(x), where
g(x) = x − f(x)/f′(x), g′(x) = f(x)f′′(x)/(f′(x))2
If x∗ is simple root (i.e., f(x∗) = 0 and f′(x∗) ̸= 0), then g′(x∗) = 0
Convergence rate of Newton’s method for sim- ple root is quadratic, r = 2
But must start close enough to root to con- verge
27
and hence
Newton’s Method, continued
For multiple root, Newton’s method linearly convergent (with constant C = 1−(1/m), where m is multiplicity)
Both cases illustrated below
k f(x)=x2−1 f(x)=x2−2x+1
0 2.0 2.0
1 1.25 1.5
2 1.025 1.25
3 1.0003 1.125
4 1.00000005 1.0625
5 1.0 1.03125
28
Secant Method
Newton’s method requires evaluation of both function and derivative at each iteration
Evaluating derivative may be inconvenient or expensive, so replace it by finite difference ap- proximation using two successive iterates,
xk+1 = xk − f(xk) xk − xk−1 f(xk) − f(xk−1)
Secant method normally superlinearly conver- gent, with r ≈ 1.618
29
Secant Method, continued
Secant method approximates nonlinear func- tion f by secant line through previous two it- erates
. . .↑x. k+x1k xk−1
30
Example: Secant Method
Use secant method to find root of f (x) = x2 − 4 sin(x) = 0
Taking x0 = 1 and x1 = 3 as starting guesses, we obtain
x f(x) h
1.000000 3.000000 1.438070 1.724805 2.029833 1.922044 1.933174 1.933757 1.933754
-2.365884 8.435520 -1.896774 -0.977706 0.534305 -0.061523 -0.003064 0.000019 0.000000
-1.561930 0.286735 0.305029
-0.107789 0.011130 0.000583
-0.000004 0.000000
31
Higher-Degree Interpolation
Secant method uses linear interpolation to ap- proximate function whose zero is sought
Higher convergence rate can be obtained by using higher-degree polynomial interpolation
Quadratic interpolation (Muller’s method) has convergence rate of r ≈ 1.839, for example
Unfortunately,
• interpolating polynomial may not have real roots
• roots may not be easy to compute
• choice of root to use as next iterate may not be obvious
32
Inverse Interpolation
Good alternative is inverse interpolation, where xk are interpolated as function of yk = f(xk) by polynomial p(y), so next approximate solution is p(0)
Most commonly used for root finding is inverse quadratic interpolation
quadratic fit
y
. . . .
.
. . . .
.
..
.
.
.
. . •. . . ..
. . . . . .
.. ..
. .
..
.
. .. .. .
. . .
. . . . . .
…
. . . . . .
. . . . .
. .
.
.
.
. .. .
. .. . . .. . . . . . . . . . . . . . . .
.. .
inverse fit
. .
next iterate
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .• . . . . . .
.
.
. .
. .. . .. . .. . ..
. 0↑
. .
. .. . . . . . ..
. . . . . . .. . . .
.. . .. .. .. . .. … .
… .•
. . . x
.. . . . . . .
. . ..
.. .
.. .
33
Inverse Quadratic Interpolation
Given approximate solution values a, b, c, with function values fa, fb, fc, next approximate so- lution found by fitting quadratic polynomial to a, b, c as function of fa, fb, fc, then evaluating polynomial at 0
We compute
u = fb/fc, v = fb/fa, w = fa/fc,
p = v(w(u−w)(c−b)−(1−u)(b−a)), q = (w − 1)(u − 1)(v − 1),
then new approximate solution is b + p/q Convergence rate normally r ≈ 1.839
34
Example: Inverse Quadratic Interpolation
Use inverse quadratic interpolation to find root of
f (x) = x2 − 4 sin(x) = 0
Taking x = 1, 2, and 3 as starting values, we obtain
x f(x) h
1.000000 2.000000 3.000000 1.886318 1.939558 1.933742 1.933754 1.933754
-2.365884 0.362810 8.435520
-0.244343 0.030786 -0.000060 0.000000 0.000000
-0.113682 0.053240 -0.005815 0.000011 0.000000
35
Linear Fractional Interpolation
Interpolation using rational fraction of form φ(x)= x−u
vx − w
especially useful for finding zeros of functions
having horizontal or vertical asymptotes
φ has zero at x = u, vertical asymptote at
x = w/v, and horizontal asymptote at y = 1/v
Given approximate solution values a, b, c, with function values fa, fb, fc, next approximate so- lution is c + h, where
h = (a−c)(b−c)(fa −fb)fc (a−c)(fc −fb)fa −(b−c)(fc −fa)fb
Convergence rate normally r ≈ 1.839, same as for quadratic interpolation (inverse or regular)
36
Example: Linear Fractional Interpolation
Use linear fractional interpolation to find root of
f (x) = x2 − 4 sin(x) = 0
Taking x = 1, 2, and 3 as starting values, we
obtain
x f(x) h
1.000000 2.000000 3.000000 1.906953 1.933351 1.933756 1.933754
−2.365884 0.362810 8.435520
−0.139647 −0.002131 0.000013 0.000000
−1.093047 0.026398 −0.000406 −0.000003
37
Safeguarded Methods
Rapidly convergent methods for solving nonlin- ear equations may not converge unless started close to solution, but safe methods are slow
Hybrid methods combine features of both types of methods to achieve both speed and reliabil- ity
Use rapidly convergent method, but maintain bracket around solution
If next approximate solution given by fast method falls outside bracketing interval, perform one iteration of safe method, such as bisection
38
Safeguarded Methods, continued
Fast method can then be tried again on smaller interval with greater chance of success
Ultimately, convergence rate of fast method should prevail
Hybrid approach seldom does worse than safe method, and usually does much better
Popular combination is bisection and inverse quadratic interpolation, for which no deriva- tives required
39
Zeros of Polynomials
For polynomial p(x) of degree n, one may want to find all n of its zeros, which may be complex even if coefficients are real
Several approaches available
• Use root-finding method such as Newton’s or Muller’s method to find one root, deflate it out, and repeat
• Form companion matrix of polynomial and use eigenvalue routine to compute all its eigenvalues
• Use method designed specifically for find- ing all roots of polynomial, such as Jenkins- Traub
40
Systems of Nonlinear Equations
Solving systems of nonlinear equations much more difficult than scalar case because
• Wider variety of behavior possible, so de- termining existence and number of solu- tions or good starting guess much more complex
• No simple way, in general, to guarantee convergence to desired solution or to bracket solution to produce absolutely safe method
• Computational overhead increases rapidly with dimension of problem
41
Fixed-Point Iteration
Fixed-point problem for g:Rn → Rn is to find vector x such that
x = g(x) Corresponding fixed-point iteration is
xk+1 = g(xk), given starting vector x0
If
ρ(G(x∗)) < 1,
where ρ is spectral radius and G(x) is Jacobian matrix of g evaluated at x, then fixed-point iteration converges if started close enough to solution
Smaller spectral radius yields faster conver- gence rate
If G(x∗) = O, then convergence rate quadratic 42
Newton’s Method
In n dimensions Newton’s method has form xk+1 = xk − J(xk)−1f(xk),
where J(x) is Jacobian matrix of f, {J(x)}ij = ∂fi(x)
∂xj
In practice, do not explicitly invert J(xk), but
instead solve linear system J(xk)sk = −f(xk),
for Newton step sk, then take as next iterate xk+1 = xk + sk
43
Example: Newton’s Method
Use Newton’s method to solve nonlinear sys- tem
f (x) = x1 + 2x2 − 2 = o x 21 + 4 x 2 2 − 4
Jacobian matrix is Jf(x) = 1 2x1
2 8x2
If we take x0 = [1 f(x0) = 3,
2]T, then Jf(x0) = 1
Solving system 1
13
2 2 16
2 s0 = −3 216 −13
gives s0 = [ −1.83 −0.58 ]T , and hence
x1 = x0 +s0 = −0.83, 1.42
44
Example Continued
f(x1)= 0 , Jf(x1)= 1 2 4.72 −1.67 11.3
Solving system
1 2s1=0
−1.67 11.3 −4.72 gives s1 = [ 0.64 −0.32 ]T , and hence
x2 = x1 +s1 = −0.19, 1.10
2 0.83 −0.38 8.76
f(x2)= 0 , Jf(x2)= 1
Iterations continue until convergence to solu- tionx∗=[0 1]T
45
Convergence of Newton’s Method
Differentiating corresponding fixed-point oper- ator
g(x) = x − J(x)−1f(x)
and evaluating at solution x∗ gives G(x∗) =
n
I − (J(x∗)−1J(x∗) + fi(x∗)Hi(x∗)) = O,
i=1
where Hi(x) is component matrix of derivative
of J(x)−1
Convergence rate of Newton’s method for non- linear systems is normally quadratic, provided Jacobian matrix J(x∗) is nonsingular
Must be started close enough to solution to converge
46
Cost of Newton’s Method
Cost per iteration of Newton’s method for dense problem in n dimensions is substantial
• Computing Jacobian matrix costs n2 scalar function evaluations
• Solving linear system costs O(n3) opera- tions
47
Secant Updating Methods
Secant updating methods reduce cost of New- ton’s method by
• Using function values at successive iterates to build approximate Jacobian and avoiding explicit evaluation of derivatives
• Updating factorization of approximate Ja- cobian rather than refactoring it each iter- ation
Most secant updating methods have superlin- ear but not quadratic convergence rate; often cost less overall than Newton’s method
48
Broyden’s Method
Broyden’s method is typical secant updating method
x0 = initial guess
B0 = initial Jacobian approximation for k = 0,1,2,...
Solve Bk sk = −f(xk) for sk xk+1 = xk + sk
yk = f(xk+1) − f(xk)
Bk+1 = Bk + ((yk − Bksk)sTk )/(sTk sk) end
49
Broyden’s Method, continued
Motivation for formula for Bk+1 is to make least change to Bk subject to satisfying secant equation
Bk+1(xk+1 − xk) = f(xk+1) − f(xk)
In practice, factorization of Bk updated instead of updating Bk directly, so total cost per iter- ation is only O(n2)
50
Example: Broyden’s Method
Use Broyden’s method to solve nonlinear sys- tem
f (x) = x1 + 2x2 − 2 = o x 21 + 4 x 2 2 − 4
If x0 = [1 2]T, then f(x0) = [3 13]T, and we choose
B0 = Jf (x0) = 1 2 2 16
Solving system
1 2 s0 = −3
216 −13
gives s0 = [ −1.83 −0.58 ]T , and hence
x1 = x0 +s0 = −0.83, 1.42
51
Example Continued
f(x1)= 0 , y0= −3 4.72 −8.28
From updating formula, we obtain
B1 = 1 2 + 0 0
2 16 −2.34 −0.74 =12
−0.34 15.3
1 2s1=0
Solving system
−0.34 15.3 −4.72
gives s1 = [ 0.59 −0.30 ]T , and hence
x2 = x1 +s1 = −0.24, 1.120
52
Example Continued
f(x2)= 0 , y1= 0 1.08 −3.64
From updating formula, we obtain
B2= 1 2 + 0 0
−0.34 15.3 1.46 −0.73 =12
1.12 14.5
Iterations continue until convergence to solu- tionx∗=[0 1]T
53
Robust Newton-Like Methods
Newton’s method and its variants may fail to converge when started far from solution
Safeguards can enlarge region of convergence of Newton-like methods
Simplest precaution is damped Newton method, in which new iterate is
xk+1 = xk + αksk,
where sk is Newton (or Newton-like) step and αk is scalar parameter chosen to ensure progress toward solution
Parameter αk reduces Newton step when it is too large, but αk = 1 suffices near solution and still yields fast asymptotic convergence rate
54
Trust-Region Methods
Another approach is to maintain estimate of trust region where Taylor series approximation, upon which Newton’s method is based, is suf- ficiently accurate for resulting computed step to be reliable
Adjusting size of trust region to constrain step size when necessary usually enables progress toward solution even starting far away, yet still permits rapid converge once near solution
Unlike damped Newton method, trust region method may modify direction as well as length of Newton step
More details on this approach in Chapter 6 55