Lecture Notes to Accompany
Scientific Computing
An Introductory Survey
Second Edition
by Michael T. Heath
Chapter 1
Scientific Computing
Copyright ⃝c 2001. Reproduction permitted only for noncommercial, educational use in conjunction with the book.
1
Scientific Computing
What is scientific computing?
Design and analysis of algorithms for solving mathematical problems in science and engi- neering numerically
Traditionally called numerical analysis Distinguishing features:
• continuous quantities
• effects of approximations
2
Scientific Computing
Why scientific computing? Simulation of physical phenomena Virtual prototyping of products
3
Well-Posed Problem
Problem well-posed if solution
• exists
• is unique
• depends continuously on problem data
Solution may still be sensitive to input data Algorithm should not make sensitivity worse
4
General Strategy
Replace difficult problem by easier one having same or closely related solution
• infinite −→ finite
• differential −→ algebraic • nonlinear −→ linear
• complicated −→ simple
Solution obtained may only approximate that of original problem
5
Sources of Approximation
Before computation:
• modeling
• empirical measurements • previous computations
During computation:
• truncation or discretization
• rounding
Accuracy of final result reflects all these
Uncertainty in input may be amplified by prob- lem
Perturbations during computation may be am- plified by algorithm
6
Example: Approximations
Computing surface area of Earth using formula
A = 4πr2 involves several approximations:
• Earth modeled as sphere, idealizing its true shape
• Value for radius based on empirical mea- surements and previous computations
• Value for π requires truncating infinite pro- cess
• Values for input data and results of arith- metic operations rounded in computer
7
Absolute Error and Relative Error
Absolute error = approx value − true value Relative error = absolute error
true value
Equivalently,
Approx value = (true value)(1 + rel. error)
True value usually unknown, so estimate or bound error rather than compute it exactly
Relative error often taken relative to approxi- mate value, rather than (unknown) true value
8
Data Error and Computational Error
Typical problem: compute value of function f : R → R for given argument
x = true value of input, f(x) = desired result
xˆ = approximate (inexact) input
fˆ= approximate function computed ˆ
Total error = f(xˆ) − f(x) = ˆ
(f(xˆ) − f(xˆ)) + (f(xˆ) − f(x)) = computational error + propagated data error
Algorithm has no effect on propagated data error
9
Truncation Error and Rounding Error
Truncation error: difference between true re- sult (for actual input) and result produced by given algorithm using exact arithmetic
Due to approximations such as truncating in- finite series or terminating iterative sequence before convergence
Rounding error: difference between result pro- duced by given algorithm using exact arith- metic and result produced by same algorithm using limited precision arithmetic
Due to inexact representation of real numbers and arithmetic operations upon them
Computational error is sum of truncation error and rounding error, but one of these usually dominates
10
Example: Finite Difference Approx.
Error in finite difference approximation f′(x) ≈ f(x + h) − f(x)
h
exhibits tradeoff between rounding error and truncation error
Truncation error bounded by Mh/2, where M bounds |f′′(t)| for t near x
Rounding error bounded by 2ε/h, where error in function values bounded by ε
Total error minimized when h ≈ 2ε/M
Error increases for smaller h because of round- ing error and increases for larger h because of truncation error
11
Example: Finite Difference Approx.
2 10
0 10
−2 10
−4 10
−6 10
−8 10
−10 10
−12 10
−14 10
−16 10
−18 10
−16 −14 −12 −10 −8 −6 −4 −2 0 10 10 10 10 10 10 10 10 10
total error
truncation error rounding error
step size
12
error
Forward and Backward Error
Suppose we want to compute y = f(x), where f : R → R, but obtain approximate value yˆ
Forward error = ∆y = yˆ − y
Backward error = ∆x = xˆ − x, where f(xˆ) = yˆ
backward error
| ↓
. .
x •. . . . . . . .
•
.. .
.
y = f(x) |
forward error
↑ ..
↑
| .. ˆ .. f
f
. .
. .
xˆ. yˆ=f(x)=f(xˆ) • . • ˆ
13
.. .
|
f .. ↓ ..
. . .
. .. .
Example: Forward and Backward Error
√
As approximation to y = solute forward error
2, yˆ = 1.4 has ab-
|∆y| = |yˆ− y| = |1.4 − 1.41421…| ≈ 0.0142, or relative forward error about 1 percent
√
Since 1.96 = 1.4, absolute backward error is
|∆x| = |xˆ−x| = |1.96−2| = 0.04, or relative backward error 2 percent
14
Backward Error Analysis
Idea: approximate solution is exact solution to modified problem
How much must original problem change to give result actually obtained?
How much data error in input would explain all error in computed result?
Approximate solution good if exact solution to “nearby” problem
Backward error often easier to estimate than forward error
15
Example: Backward Error Analysis
To approximate cosine function f(x) = cos(x), truncating Taylor series after two terms gives
such that f(xˆ) = f(x)
2
yˆ = f ( x ) = 1 − x / 2
ˆ
2
ˆ
Forward error:
∆y = yˆ−y = f(x)−f(x) = 1−x /2−cos(x)
To determine backward error, need value xˆ ˆ
For cosine function,
xˆ = arccos(f(x)) = arccos(yˆ)
ˆ
16
Example, continuted
For x=1,
y = f(1) = cos(1) ≈ 0.5403,
2 yˆ=f(1)=1−1 /2=0.5,
ˆ
xˆ = arccos(yˆ) = arccos(0.5) ≈ 1.0472
Forward error:
∆y = yˆ − y ≈ 0.5 − 0.5403 = −0.0403,
Backward error:
∆x = xˆ−x ≈ 1.0472−1 = 0.0472
17
Sensitivity and Conditioning
Problem insensitive, or well-conditioned, if rel- ative change in input causes similar relative change in solution
Problem sensitive, or ill-conditioned, if relative change in solution can be much larger than that in input data
Condition number:
cond = |relative change in solution|
|relative change in input data| = |[f(xˆ)−f(x)]/f(x)| = |∆y/y|
|(xˆ − x)/x| |∆x/x|
Problem sensitive, or ill-conditioned, if cond ≫ 1
18
Condition Number
Condition number is “amplification factor” re- lating relative forward error to relative back- ward error:
relative relative
forward error = cond × backward error
Condition number usually not known exactly and may vary with input, so rough estimate or upper bound used for cond, yielding
relative relative
forward error cond × backward error 19
Example: Evaluating Function
Evaluating function f for approximate input xˆ = x + ∆x instead of true input x gives
Abs. forward err. = f(x+∆x)−f(x) ≈ f′(x)∆x,
Rel. forward err. = f(x + ∆x) − f(x) ≈ f′(x)∆x, f (x) f (x)
f ′(x)∆x/f (x) xf ′(x) cond≈ ∆x/x = f(x)
Relative error in function value can be much larger or smaller than that in input, depending on particular f and x
20
Example: Sensitivity
Tangent function for arguments near π/2: tan(1.57079) ≈ 1.58058 × 105
tan(1.57078) ≈ 6.12490 × 104 Relative change in output quarter million times
greater than relative change in input For x = 1.57079, cond ≈ 2.48275 × 105
21
Stability
Stability of algorithm analogous to condition- ing of problem
Algorithm stable if result relatively insensitive to perturbations during computation
From point of view of backward error analy- sis, algorithm stable if result produced is exact solution to nearby problem
For stable algorithm, effect of computational error no worse than effect of small data error in input
22
Accuracy
Accuracy refers to closeness of computed so- lution to true solution of problem
Stability alone does not guarantee accuracy
Accuracy depends on conditioning of problem as well as stability of algorithm
Inaccuracy can result from applying stable al- gorithm to ill-conditioned problem or unstable algorithm to well-conditioned problem
Applying stable algorithm to well-conditioned problem yields accurate solution
23
Floating-Point Numbers
Floating-point number system characterized by four integers:
β base or radix
p precision
[L,U] exponent range
Number x represented as
x = ± d0 + d1 + d2 + · · · + dp−1 βE,
β β2 βp−1 0≤di≤β−1, i=0,…,p−1, andL≤E≤U
where
d0d1 · · · dp−1 called mantissa E called exponent
d1d2 · · · dp−1 called fraction
24
Typical Floating-Point Systems
Most computers use binary (β = 2) arithmetic Parameters for typical floating-point systems
shown below
system βpLU
IEEE SP
IEEE DP
Cray
HP calculator IBM mainframe
2 24 2 53 2 48
10 12 16 6
−126 127 −1022 1023 −16383 16384 −499 499 −64 63
IEEE standard floating-point systems almost universally adopted for personal computers and workstations
25
Normalization
Floating-point system normalized if leading digit d0 always nonzero unless number represented is zero
In normalized system, mantissa m of nonzero floating-point number always satisfies
1≤m<β Reasons for normalization:
• representation of each number unique
• no digits wasted on leading zeros
• leading bit need not be stored (in binary system)
26
Properties of Floating-Point Systems
Floating-point number system finite and dis- crete
Number of normalized floating-point numbers: 2(β − 1)βp−1(U − L + 1) + 1
Smallest positive normalized number:
underflow level = UFL = βL Largest floating-point number:
overflow level = OFL = βU+1(1 − β−p) Floating-point numbers equally spaced only be-
tween powers of β
Not all real numbers exactly representable; those that are are called machine numbers
27
Example: Floating-Point System
Tick marks indicate all 25 numbers in floating- point system having β = 2, p = 3, L = −1, and U=1
. −4 −3 −2 −1 0 1 2 3 4
OFL = (1.11)2 × 21 = (3.5)10
UFL = (1.00)2 × 2−1 = (0.5)10
At sufficiently high magnification, all normal- ized floating-point systems look grainy and un- equally spaced like this
28
Rounding Rules
If real number x not exactly representable, then approximated by “nearby” floating-point num- ber fl(x)
Process called rounding, and error introduced called rounding error
Two commonly used rounding rules:
• chop: truncate base-β expansion of x after
(p−1)st digit; also called round toward zero
• round to nearest: fl(x) nearest floating- point number to x, using floating-point num- ber whose last stored digit is even in case of tie; also called round to even
Round to nearest most accurate, and is default rounding rule in IEEE systems
29
Machine Precision
Accuracy of floating-point system character- ized by unit roundoff, machine precision, or machine epsilon, denoted by εmach
With rounding by chopping, εmach = β1−p With rounding to nearest, εmach = 1β1−p
2
Alternative definition is smallest number ε such that fl(1+ε) > 1
Maximum relative error in representing real num- ber x in floating-point system given by
fl(x) − x
x ≤ εmach
30
Machine Precision, continued
For toy system illustrated earlier,
εmach = 0.25 with rounding by chopping εmach = 0.125 with rounding to nearest
For IEEE floating-point systems,
εmach = 2−24 ≈ 10−7 in single precision εmach = 2−53 ≈ 10−16 in double precision
IEEE single and double precision systems have about 7 and 16 decimal digits of precision
Though both are “small,” unit roundoff error εmach should not be confused with underflow level UFL
In all practical floating-point systems, 0
Cancellation
Subtraction between two p-digit numbers hav- ing same sign and similar magnitudes yields result with fewer than p digits, so it is usually exactly representable
Reason is that leading digits of two numbers cancel (i.e., their difference is zero)
Example:
1.92403×102−1.92275×102 = 1.28000×10−1,
which is correct, and exactly representable, but has only three significant digits
39
Cancellation, continued
Despite exactness of result, cancellation often implies serious loss of information
Operands often uncertain due to rounding or other previous errors, so relative uncertainty in difference may be large
Example: if ε is positive floating-point number slightly smaller than εmach,
(1+ε)−(1−ε) = 1−1 = 0
in floating-point arithmetic, which is correct for actual operands of final subtraction, but true result of overall computation, 2ε, has been completely lost
Subtraction itself not at fault: it merely signals loss of information that had already occurred
40
Cancellation, continued
Digits lost to cancellation are most significant, leading digits, whereas digits lost in rounding are least significant, trailing digits
Because of this effect, it is generally bad idea to compute any small quantity as difference of large quantities, since rounding error is likely to dominate result
For example, summing alternating series, such as
x x2x3
e =1+x+2!+3!+···
for x < 0, may give disastrous results due to catastrophic cancellation
41
Example: Cancellation
Total energy of helium atom is sum of kinetic and potential energies, which are computed
separately and have opposite signs, so cancellation
Year Kinetic Potential Total
suffer
1971 13.0 1977 12.76 1980 12.22 1985 12.28 1988 12.40
−14.0 −1.0 −14.02 −1.26 −14.35 −2.13 −14.65 −2.37 −14.84 −2.44
Although computed values for kinetic and po- tential energies changed by only 6% or less, resulting estimate for total energy changed by 144%
42
Example: Quadratic Formula
Two solutions of quadratic equation ax2 +bx+c = 0
given by
x = −b ± b2 − 4ac
2a
Naive use of formula can suffer overflow, or
underflow, or severe cancellation
Rescaling coefficients can help avoid overflow and harmful underflow
Cancellation between −b and square root can be avoided by computing one root using alter- native formula
2c x= 2
−b∓ b −4ac
Cancellation inside square root cannot be eas-
ily avoided without using higher precision 43
Example: Standard Deviation
Mean of sequence xi, i = 1,...,n, is given by 1 n
x ̄=n xi, i=1
and standard deviation by
1
1n2 σ = ( x i − x ̄ ) 2
n − 1 i=1 Mathematically equivalent formula
1 1n2
22
σ = n − 1 x i − n x ̄
i=1
avoids making two passes through data
Unfortunately, single cancellation error at end of one-pass formula is more damaging numeri- cally than all of cancellation errors in two-pass formula combined
44
Mathematical Software
High-quality mathematical software is available for solving most commonly occurring problems in scientific computing
Use of sophisticated, professionally written soft- ware has many advantages
We will seek to understand basic ideas of meth- ods on which such software is based, so that we can use software intelligently
We will gain hands-on experience in using such software to solve wide variety of computational problems
45
Desirable Qualities of Math Software
• Reliability
• Robustness
• Accuracy
• Efficiency
• Maintainability • Portability
• Usability
• Applicability
46
Sources of Math Software
FMM: From book by Forsythe/Malcolm/Moler HSL: Harwell Subroutine Library
IMSL: Internat. Math. & Stat. Libraries
KMN: From book by Kahaner/Moler/Nash NAG: Numerical Algorithms Group
Netlib: Free software available via Internet NR: From book Numerical Recipes
NUMAL: From Math. Centrum, Amsterdam SLATEC: From U.S. Government labs
SOL: Systems Optimization Lab, Stanford U. TOMS: ACM Trans. on Math. Software
47
Scientific Computing Environments
Interactive environments for scientific comput- ing provide
• powerful mathematical capabilities
• sophisticated graphics
• high-level programming language for rapid prototyping
MATLAB is popular example, available for most personal computers and workstations
Similar, “free” alternatives include octave, RLaB, and Scilab
Symbolic computing environments, such as Maple and Mathematica, also useful
48