Numerical Methods & Scientific Computing: lecture notes
Linear Systems
Error analysis
Week 8: aim to cover
Data fitting, linear least squares (Lecture 15) Linear least squares (Lab 8)
QR factorization, SVD (Lecture 16)
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Data fitting
A very common task:
given a set of data {xi,yi},i = 1···m with observational error find a line y = a + bx that ‘fits’ the data
=) we want yi = a + bxi + ✏i
If m = 2, can interpolate; what if m > 2?
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Two possible approaches
1 from linear algebra, 1 from calculus (optimization) Writeyi =f(xi)
a+bx1 = y1 a+bx2 = y2
. = . a+bxm = ym
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Overdetermined linear system
26 1
61 61 4 .
1
x 1 37 26 y 1 37
x2 7a 6y2 7 x3 7 b =6y3 7 . 5 4 . 5
xm ym
a linear system Ax = b with A an m ⇥ 2 rectangular matrix.
Since more rows than columns ! overdetermined system — unless b is exceptional, there is no solution to such a system
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Minimum residual solution
there is no solution to such a system ! what is the best we can do? exactsolution =) r=b Ax=0
so let’s find x that minimizes residual r — but in what norm?
can do more statistics (inference) if we choose the 2-norm
! minimize k r k2
[other choices of norm give other forms of fitting]
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Least squares problem
Definition
the least squares solution to Ax = b where m > n = rank(A) is the solution x that minimizes k r k2
Inourcase,ri=Pyi a bxi, i=1···m
so to minimize k r k2 means to minimize
krk2=rTr= i(yi a bxi)2 ⌘S thesumofsquaredresiduals hence the name — method of least squares
The necessary conditions for a stationary point (min/max) are:
@S = @S = 0 @a @b
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Finding minimum of S
@S =0 =) 2X(yi a bxi)=0 =) Xyi =ma+bXxi @a i
@S = 0 =) 2Xxi(yi a bxi) = 0 ) Xxiyi = aXxi +bXxiyi @b i
which gives a 2 ⇥ 2 matrix equation for a, b
m Pxi a Pyi
Pxi Pxi2 b = Pxiyi
Numerical Methods & Scientific Computing: lecture notes
Data fitting
The normal equations
In terms of the overdetermined system, Ax = b, this square system is
the normal equations
ATAx = ATb
! the simplest way to solve any overdetermined system is to solve the normal equations
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Another derivation
In Ax = b, b is a vector in Rm; the columns of A = [a1a2] are each vectors in Rm.
Any linear combination of those columns can be written
x1a1 + x2a2 = Ax : these form a 2D subspace of Rm.
We want to find a vector in this subspace (i.e. all vectors of the form Ax) as close to b as possible. How?
To minimize k r k2, we make r ? the subspace i.e. r ? the columns of A.
=) a1Tr=0;a2Tr=0orATr=0
AT r = AT (b Ax) = AT b AT Ax = 0
i.e. the normal equations
=) Ax is the projection of b onto the subspace formed by columns of A
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Are the normal equations invertible?
Theorem
If A has rank 2, then AT A is square and nonsingular
Proof:
Numerical Methods & Scientific Computing: lecture notes
Data fitting
The pseudoinverse
So if A has rank 2, we can write
x = (AT A) 1AT b
which is also written
x = A†b
where A† = (AT A) 1AT is the pseudoinverse of A Of course, don’t compute the pseudoinverse;
just solve the normal equations
Numerical Methods & Scientific Computing: lecture notes
Data fitting
An example
Example
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Linear models
This can all be generalized to any linear model
i.e. fit to y = x1 1(t) + x2 2(t) + · · · + xn n(t) from data points {Xi,Yi},i = 1···m,m > n
form the overdetermined system Ax = b
2 1(X1) 2(X1) ··· n(X1) 3 2 Y1 3
A=6 1(X2) 2(X2) ··· . 7 b=6 Y2 7 64.. .754.5
1(Xm) 2(Xm) · · · n(Xm) Ym form the normal equations AT Ax = AT b as before
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Since Aij = j(Xi), (ATA)ij = Pk i(Xk) j(Xk)
then, if rank(A) = min(m,n) = n (A is full rank), normal equations have a unique solution
Example: polynomial curve fitting
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Properties of normal equations
But AT A is symmetric and positive definite Proof:
=) solve using CPholesky factorization ! takes ⇡ n3/6 ops.
also must form AT A, an n ⇥ n (symmetric) matrix with n2/2 di↵erent entries, each one k i (Xk ) j (Xk ) i.e. m multiplies
! 1 n2(m + 1 n) operations 23
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Problems with the normal equations
BUT solving the normal equations by Cholesky is NOT the recommended way to find the least squares solution – WHY NOT?
1 if A is ‘close to singular’, can get AT A singular
2 forming normal equations CAN worsen the conditioning (sensitivity)
of least squares problem
3 if A is rank-deficient then AT A is singular =) can’t solve normal equations
(Cholesky factors are singular so can’t solve by back-substitution)
Numerical Methods & Scientific Computing: lecture notes
Data fitting
The matrix 2-norm
Example:
The 2-norm is the natural norm for LSQ problems (minimizing
k r k2) =) can no longer avoid the matrix 2-norm :
for a square matrix A (see ‘MatrixNorms’ for proof) kAk2 = q max(AT A)
max(AT A) is the largest eigenvalue of AT A
(all eigenvalues are positive since AT A is positive definite).
Numerical Methods & Scientific Computing: lecture notes
Data fitting
Singular value decomposition SVD
It is easier to characterize the condition number in the 2-norm in terms of the singular values of A. To do that we need the
Definition
A m ⇥ n real matrix A has the singular value decomposition A = U⌃VT
where
1 U is m ⇥ m orthogonal matrix
2 ⌃ is a diagonal m⇥n real matrix
3 V is n ⇥ n orthogonal matrix
The non-negative diagonal entries { k 0} in ⌃ are called the singular values of A.
In our case, where m > n, there are n positive singular values
1 2…