Overdetermined Linear Systems
1/63
Contents
1 Introduction
Opening Example: Polynomial Approximation The Normal Equations
Appendix: Derivation of Normal Equation
2 QR Factorization
Preliminary: Orthogonality
QR Factorization
Appendix: Gram-Schmidt Orthogonalization
3 QR Algorithm
Revisiting Least Squares
Householder Transformation and QR Algorithm Supplementary 1: Projection and Re ection Supplementary 2: Conditioning and Stability
2/63
Introduction
3/63
Opening Example: Polynomial Approximation
4/63
Introduction
Problem: Fitting Functions to Data
Given data points tpxi , yi q | i P Nr1, msu, pick a form for the “ tting” function f pxq and minimize its total error in representing the data.
‚ With real-world data, interpolation is often not the best method.
‚ Instead of nding functions lying exactly on given data points, we look for
ones which are “close” to them.
‚ In the most general terms, the tting function takes the form
f pxq “ c1 f1 pxq ` ̈ ̈ ̈ ` cn fn pxq,
where f1,…,fn are known functions while c1,…,cn are to be determined.
5/63
Linear Least Squares Approximation
In this discussion:
‚ use a polynomial tting function ppxq “ c1 ` c2x ` ̈ ̈ ̈ ` cnxn ́1 with
n † m;
‚ minimize the 2-norm of the error ri “ yi ́ ppxiq:
gf f ÿm gf f ÿm ` ̆ 2 }r}2“e ri2“e yi ́ppxiq.
i“1 i“1
Since the tting function is linear in unknown coe cients and the 2-norm is minimized, this method of approximation is called the linear least squares (LLS) approximation.
6/63
Example: Temperature Anomaly
Below are 5-year averages of the worldwide temperature anomaly as compared to the 1951-1980 average (source: NASA).
Year
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
Anomaly ( ̋C) -0.0480 -0.0180 -0.0360 -0.0120 -0.0040 0.1180 0.2100 0.3320 0.3340 0.4560
7/63
Example: Import and Plot Data
t = (1955:5:2000)’;
y = [-0.0480; -0.0180;
-0.0360; -0.0120;
-0.0040; 0.1180;
0.2100; 0.3320;
0.3340; 0.4560];
plot(t, y, ’.’)
0.5
0.4
0.3
0.2
0.1
0
-0.1
1955 1960 1965
1970 1975
1980 1985
1990 1995 2000
year
8/63
anomaly (°C)
Example: Interpolation
t = (t-1950)/10;
n = length(t);
V = t.^(0:n-1);
c = V\y;
p = @(x) polyval(flip(c),
(x-1950)/10);
hold on
fplot(p, [1955 2000])
0.5
0.4
0.3
0.2
0.1
0
-0.1
1955 1960 1965
1970 1975
1980 1985
1990 1995 2000
year
9/63
anomaly (°C)
Fitting by a Straight Line
Suppose that we are tting data to a linear polynomial: ppxq “ c1 ` c2x. ‚ If it were to pass through all data points:
$’y1“ppx1q“c1`c2x1 »y1fi »1 x1fi
’& y2 “ ppx2q “ c1 ` c2x2 matrix equation — y2 —1
’. . . ›››››››››Ñ—. “—.. c
… 2
’
’ ’ ’% y “ p p x q “ c ` c x
y m 1 x m c lomon loooomoooon
m m 1 2 m
‚ The above is unsolvable; instead, nd c which makes the residual
r “ y ́ V c “as small as possible” in the sense of vector 2-norm. ‚ Notation:y“=”Vc
x2 „c1⇢ – . fl –. . fllomon
yV
10/63
MATLAB Implementation
Revisiting the temperature anomaly example again:
0.5
year = (1955:5:2000)’;
t = year – 1955; 0.4 V = t.^(0:1);
c = V\y;
p = @(x) polyval(flip(c),
0.2
x-1955);
plot(year, y, ’.’) 0.1 hold on
fplot(p, [1955, 2000])
-0.1
-0.2
1955 1960 1965
0.3
0
1970 1975
1980 1985
1990 1995 2000
year
11/63
anomaly (°C)
Fitting by a General Polynomial
In general, when tting data to a polynomial
we need to solve
normal equation
V TV c “ V Ty .
ppxq“c1 `c2x`c3x2 ` ̈ ̈ ̈`cnxn ́1, »y1 fi »1 x1 x2 ̈ ̈ ̈ xn ́1fi »c1fi
— —11 — —y2 —1 x2 x2 ̈ ̈ ̈ xn ́1 —c2 — “=”— 2 2 — . —– . . . fl —– . . . . . . . . . . . . . . . fl —– . . . fl
ym 1 xm x2 ̈ ̈ ̈ xn ́1 cn mm
lomon looooooooooooooooomooooooooooooooooon lomon
yVc
‚ The solution c of y “=” V c turns out to be the solution of the
12/63
MATLAB Implementation
Revisiting the temperature anomaly example again:
0.5
V = t.^(0:3);
c = V\y; 0.4 q = @(x) polyval(flip(c),
x-1955);
hold on
fplot(q, [1955 2000])
0.3
0.2
0.1
0
-0.1
-0.2
1955 1960 1965
1970 1975
1980 1985
1990 1995 2000
year
13/63
anomaly (°C)
Backslash Again
The Versatile Backslash
InMATLAB,thegenericlinearequationAx“bissolvedbyx = A\b. ‚ When A is a square matrix, Gaussian elimination is used.
‚ When A is NOT a square matrix, the normal equation ATAx “ ATb is solved instead.
‚ AslongasAPRmˆn wherem•nhasrankn,thesquarematrixATAis nonsingular. (unique solution)
‚ Though ATA is a square matrix, MATLAB does not use Gaussian elimination to solve the normal equation.
‚ Rather, a faster and more accurate algorithm is used.
14/63
The Normal Equations
15/63
LLS and Normal Equation
Big Question: How is the least square solution x to Ax “=” b equivalent to the solution of the normal equation ATAx “ ATb?
Theorem (Normal Equation)
LetAPRmˆn withm•n.IfxPRn satis esATAx“ATb,thenxsolvesthe LLS problems, i.e., x minimizes }b ́ Ax}2.
16/63
Proof of the Theorem
Idea of Proof1. Enough show to that ›b ́ Apx ` yq›2 • }b ́ Ax}2 for any y P Rn.
‚ Useful algebra:
pu ` vqTpu ` vq “ uTu ` uTv ` vTv ` vTv “ uTu ` 2vTu ` vTv.
‚ Exercise: Prove it.
1Alternately, one can derive the normal equation using calculus. See Appendix.
17/63
Appendix: Derivation of Normal Equation
18/63
Derivation of Normal Equation
Consider Ax “=” b where A P Rmˆn where m • n.
‚ Requirement: minimize the 2-norm of the residual r “ b ́ Ax:
2ÿm ́ÿn ̄2 gpx1,x2,…,xnq :“ }r}2 “ bi ́ aijxj .
i“1 j“1 ‚ Strategy: using calculus, nd the minimum by setting
0 “ Ògpx1,x2,…,xnq which yields n equations in n unknowns x1, x2, . . . , xn.
19/63
Derivation of Normal Equation (cont’)
Noting that Bxj{Bxk “ ”j,k, the n equations Bg{Bxk “ 0 are written out as
ÿm ` ÿn ̆
0 “ 2 bi ́ aijxj p ́aikq,
i“1 j“1 which can be rearranged into
for k P Nr1,ns, for k P Nr1,ns.
ÿm i“1
ÿm ÿn i“1 j“1
aikbi “
One can see that the two sides correspond to the kth elements of ATb and
ATAx respectively:
showing the desired equivalence.
aijaikxj,
ATAx “ ATb,
20/63
QR Factorization
21/63
Preliminary: Orthogonality
22/63
Normal Equation Revisited
Alternate perspective on the “normal equation”:
ATpb ́Axq“0 ñ zTp b ́Ax q“0 forallzPRpAq,
i.e., x solves the normal equation if and only if the residual is orthogonal to the range of A.
residual “ r
b
lo omo on
b y y = Ax
range(A)
23/63
Orthogonal Vectors
Recall that the angle ◊ between two vectors u, v P Rn satis es cosp◊q“ uTv .
De nition 1
‚ Two vectors u, v P Rn are orthogonal if uTv “ 0.
‚ Vectors q1,q2,…,qk P Rn are orthogonal if qiTqj “ 0 for all i ‰ j. ‚ Vectors q1,q2,…,qk P Rn are orthonormal if qiTqj “ ”i,j.
Notation. (Kronecker delta function)
”i,j“#1, i“j 0, i‰j
}u}2 }v}2
24/63
Matrices with Orthogonal Columns
LetQ““q1 |q2 | ̈ ̈ ̈|qk ‰PRnˆk.Notethat
q
»— q 1T fi » QTQ “ — q2T –q
fi »— q 1T q 1 ̈ ̈ ̈ q fl “ —q2Tq1
q 1T q 2 q2Tq2
.
q kT q 2
̈ ̈ ̈ ̈ ̈ ̈
…
̈ ̈ ̈
q 1T q k fi q2Tqk .
— . 1 – q kT fl
2
k —– .
q kT q 1
. fl q kT q k
Therefore,
‚ q1,…,qk areorthogonal.
q1, . . . , qk are orthonormal.
ñ QTQisakˆkdiagonalmatrix. ñ QTQ is the k ˆ k identity matrix.
25/63
Matrices with Orthonormal Columns
Theorem 2
LetQ““q1 |q2 | ̈ ̈ ̈|qk ‰PRnˆk andsupposethatq1,…,qk are orthonormal. Then
1 QTQ“IPRkˆk;
2 }Qx}2 “}x}2 forallxPRk; 3 }Q}2“1.
26/63
Orthogonal Matrices
De nition 3
We say that Q P Rnˆn is an orthogonal matrix if QTQ “ I P Rnˆn.
‚ A square matrix with orthogonal columns is not, in general, an orthogonal matrix!
27/63
Properties of Orthogonal Matrices
Theorem 4
Let Q P Rnˆn be orthogonal. Then
1 Q ́1“QT;
2 QT is also an orthogonal matrix;
3 Ÿ2 pQq “ 1;
4 For any A P Rnˆn, }AQ}2 “ }A}2;
5 if P P Rnˆn is another orthogonal matrix, then P Q is also orthogonal.
28/63
Why Do We Like Orthogonal Vectors?
‚ If u and v are orthogonal, then }u ̆v}2 “
‚ Without orthogonality, it is possible that }u ́ v}2 is much smaller than }u}2 and }v}2.
‚ The addition and subtraction of orthogonal vectors are guaranteed to be well-conditioned.
vvu ́v u ́v
u
u
29/63
QR Factorization
30/63
The QR Factorization
The following matrix factorization plays a role in solving linear least squares problems similar to that of LU factorization in solving linear systems.
Theorem 5
LetAPRmˆn wherem•n.ThenA“QRwhereQPRmˆm isorthogonaland R P Rmˆn is upper triangular.
» fi » — — — —
»— r 1 1 r 1 2 ̈ ̈ ̈ r 1 n fi fi—0 r22 ̈ ̈ ̈ r2n
—. … .
̈ ̈ ̈ rnn fl—0 0 ̈ ̈ ̈ 0
qm — 0 0 looooooooooooomooooooooooooon looooooooooooomooooooooooooon —– . . . fl
—a1 a2 ̈ ̈ ̈ an “ —q1
–
fl –
̈ ̈ ̈
q2
A Q 00 ̈ ̈ ̈0
loooooooooooooomoooooooooooooon
R
31/63
Thick VS Thin QR Factorization
‚ HereistheQRfactorizationagain.
»
–
‚ When m is much larger than n, it is much more e cient to use the thin or
compressed QR factorization.
— — A“ — — — q 1
»—r11 r12 ̈ ̈ ̈ r1nfi fi—0 r22 ̈ ̈ ̈ r2n
— — . . . . . . . . .
̈ ̈ ̈ r n n (thick) fl—0 0 ̈ ̈ ̈ 0
Q 00 ̈ ̈ ̈0 loooooooooooooomoooooooooooooon
»—
A “ —q1 —–
q2
fi »— r 1 1 r 1 2 ̈ ̈ ̈ r 1 n fi
qn — 0 r22 ̈ ̈ ̈ r2n (thin)
q m — — — 0 0 looooooooooooomooooooooooooon—– . . . fl
q 2
̈ ̈ ̈
̈ ̈ ̈
fl —– . . . . . . . . . fl 0 0 ̈ ̈ ̈ rnn
looooooooooooomooooooooooooon loooooooooooooomoooooooooooooon Qp Rp
32/63
R
QR Factorization in MATLAB
Either type of QR factorization is computed by qr command. ‚ Thick/Full QR factorization
[Q, R] = qr(A)
‚ Thin/Compressed QR factorization
[Q, R] = qr(A, 0)
Test the orthogonality of Q by calculating the norm of QTQ ́ I where I is the
identity matrix with suitable dimensions.
norm(Q’*Q – eye(m)) % full QR
norm(Q’*Q – eye(n)) % thin QR
33/63
Least Squares and QR Factorization
Substitute the thin factorization A “ QpRp into the normal equation ATAx “ ATb and simplify.
34/63
Least Squares and QR Factorization (cont’)
Summary: Algorithm for LLS Approximation
If A has rank n, the normal equation ATAx “ ATb is consistent and is equivalent to Rpx “ QpTb.
1 Factor A “ QpRp.
2 Letz“QpTb.
3 Solve Rpx “ z for x using backward substitution.
35/63
Least Squares and QR Factorization (cont’)
function x = lsqrfact(A,b)
% LSQRFACT x = lsqrfact(A,b)
% Sove linear least squares by QR factorization
% Input:
% A coefficient matrix (m-by-n, m>n)
% b right-hand side (m-by-1)
% Output:
% x minimizer of || b – Ax || (2-norm)
[Q,R] = qr(A,0); % thin QR fact.
z = Q’*b;
x = backsub(R,c);
end
36/63
Appendix: Gram-Schmidt Orthogonalization
37/63
The Gram–Schmidt Procedure
Problem: Orthogonalization
Given a1,…,an P Rm, construct orthonormal vectors q1,…,qn P Rm such that spanta1,…,aku “ spantq1,…,qku, for any k P Nr1,ns.
‚ Strategy. At the jth step, nd a unit vector qj P spanta1,…,aju that is orthogonal to q1, . . . , qj ́1.
‚ Key Observation. The vector vj de ned by
v “a ́ ́qTa ̄q ́ ́qTa ̄q ́ ̈ ̈ ̈ ́ ́qT a ̄q
j j 1j 1 2j 2 j ́1j j ́1 satis es the required properties.
38/63
GS Algorithm
The Gram–Schmidt iteration is outlined below:
q1 “ a1 , r11
q2 “ a2 ́ r12q1 , r22
q3 “ a3 ́r13q1 ́r23q2,
r33 .
a ́∞n ́1r q q “ n i“1 in i,
n
rnn
where
$’&’qiTaj, if i ‰ j › j ́1 ›
rij“’ ̆›aj ́ÿrkjqk›, ifi“j. % k“1 2
39/63
GS Procedure and Thin QR Factorization
‚ The GS algorithm, written as a matrix equation, yields a thin QR factorization:
»— fi »— fi »r11 r12 ̈ ̈ ̈ r1n fi
a “ —q ̈ ̈ ̈ q — 0 r22 ̈ ̈ ̈ r2n —1 n —1 n —–. … . fl
A “ —a ̈ ̈ ̈
— — — — — — 0 0 ̈ ̈ ̈ rnn
“ QpRp
—– fl —– fl loooooooooooooomoooooooooooooon
Rp looooooooomooooooooon looooooooomooooooooon
A Qp
‚ While it is an important tool for theoretical work, the GS algorithm is numerically unstable.
40/63
QR Algorithm
41/63
Revisiting Least Squares
42/63
Moore-Penrose Pseudoinverse
Let A P Rmˆn with m • n and suppose that columns of A are linearly independent.
‚ The least square problem Ax “=” b is equivalent to the normal equation ATAx “ ATb, which is a square matrix equation.
‚ The solution can be written as
x “ ́ATA ̄ ́1 ATb.
A` “ ́ATA ̄ ́1 AT P Rnˆm,
is called the (Moore-Penrose) pseudoinverse.
‚ MATLAB’s backslash is mathematically equivalent to left-multiplication by
the inverse or pseudoinverse of a matrix.
‚ MATLAB’s pinv calculates the pseudoinverse, but it is rarely used in
practice, just as inv.
‚ The matrix
43/63
Moore-Penrose Pseudoinverse (cont’)
‚ A` can be calculated by using the thin QR factorization2 A “ QpRp. A ` “ Rp ́ 1 Qp T .
2It can be done using the thick QR factorization as seen on p. 1624 of the text.
44/63
Least Squares and QR Factorization
Substitute the thin factorization A “ QpRp into the normal equation ATAx “ ATb and simplify.
45/63
Least Squares and QR Factorization (cont’)
Summary: Algorithm for LLS Approximation
If A has rank n, the normal equation ATAx “ ATb is consistent and is equivalent to Rpx “ QpTb.
1 Factor A “ QpRp.
2 Letz“QpTb.
3 Solve Rpx “ z for x using backward substitution.
46/63
Least Squares and QR Factorization (cont’)
function x = lsqrfact(A,b)
% LSQRFACT x = lsqrfact(A,b)
% Sove linear least squares by QR factorization
% Input:
% A coefficient matrix (m-by-n, m>n)
% b right-hand side (m-by-1)
% Output:
% x minimizer of || b – Ax || (2-norm)
[Q,R] = qr(A,0); % thin QR fact.
z = Q’*b;
x = backsub(R,c);
end
47/63
Householder Transformation and QR Algorithm
48/63
Motivation
Problem
Given z P Rm, nd an orthogonal matrix H P Rmˆm such that Hz is nonzero only in the rst element.
‚ Since orthogonal matrices preserve the 2-norm, H must satisfy »— ̆ } z } 2 fi
Hz“— 0. “ ̆}z} e1. —– 0. fl 2
‚ The Householder transformation matrix H de ned by vvT
H“I ́2vTv, wherev“ ̆}z}2e1 ́z, solves the problem. See Theorem 6 on the next slide.
49/63
Properties of Householder Transformation
Theorem 6
Let v “ }z}2 e1 ́ z and let H be the Householder transformation de ned by vvT
Then
1 H is symmetric;
2 H is orthogonal;
3 Hz “ }z}2 e1.
‚ H is invariant under scaling of v. ‚ If }v}2 “ 1, then H “ I ́ vvT.
H “ I ́ 2vTv.
50/63
Geometry Behind Householder Transformation (cont’)
The Householder transformation matrix H can be thought of as a re ector3.
O
z
✓
“ mirror ”
Hz
3See Supplementary 1 for review on projection and re ection operators
w
v
51/63
Factorization Algorithm
‚ The Gram-Schmidt orthogonalization (thin QR factorization) is unstable in oating-point calculations.
‚ Stable alternative: Find orthogonal matrices H1 , H2 , . . . , Hn so that HnHn ́1 ̈ ̈ ̈H2H1 A “ R.
loooooooooomoooooooooon
“:QT
introducing zeros one column at a time below diagonal terms.
‚ As a product of orthogonal matrices, QT is also orthogonal and so
pQTq ́1 “ Q. Therefore,
A “ QR .
52/63
MATLAB Demonstration Code MYQR
function [Q, R] = myqr(A)
[m, n] = size(A);
A0 = A;
Q = eye(m);
for j = 1:min(m,n)
Aj = A(j:m, j:n);
z = Aj(:, 1);
v = z + sign0(z(1))*norm(z)*eye(length(z), 1);
Hj = eye(length(v)) – 2/(v’*v) * v*v’;
Aj = Hj*Aj;
H = eye(m);
H(j:m, j:m) = Hj;
Q = Q*H;
A(j:m, j:n) = Aj;
end
R = A;
end
53/63
MATLAB Demonstration Code MYQR (cont’)
(continued from the previous page)
% local function
function sign0(x)
y = ones(size(x));
y(x < 0) = -1;
end
‚ The MATLAB command qr works similar to, but more e ciently than, this.
‚ The function nds the factorization in „ p2mn2 ́ n3{3q ops asymptotically.
54/63
Supplementary 1: Projection and Re ection
55/63
Projection and Re ection Operators
Let u, v P Rm be nonzero vectors.
‚ Projection of u onto xvy “ spanpvq:
vTuv “ ̃vvT ̧u “: Pu. vTv vTv
“:P
‚ Projection of u onto xvyK, the orthogonal complement of xvy:
u ́ vT u v “ ̃I ́ vvT ̧ u “: pI ́ P qu. vTv vTv
‚ Re ection of u across xvyK:
u ́ 2 vT u v “ ̃I ́ 2 vvT ̧ u “: pI ́ 2P qu. vTv vTv
looomooon
56/63
Projection and Re ection Operators (cont’)
Summary: for given v P Rm, a nonzero vector, let P“vvT PRmˆm.
vTv
Then the following matrices carry out geometric transformations
‚ Projection onto xvy: P
‚ Projection onto xvy: I ́ P
‚ Re ection across xvyK: I ́ 2P
Note. If v were a unit vector, the de nition of P simpli es to P “ vvT.
57/63
Supplementary 2: Conditioning and Stability
58/63
Analytical Properties of Pseudoinverse
The matrix ATA appearing in the de nition of A` satis es the following properties.
Theorem 7
For any A P Rmˆn with m • n, the following are true:
1 ATA is symmetric.
2 ATA is singular if and only if rankpAq † n.
3 If ATA is nonsingular, then it is positive de nite.
A symmetric positive de nite (SPD) matrix S such as ATA permits so-called
the Cholesky factorization
S “ RTR
where R is an upper triangular matrix.
59/63
Least Squares Using Normal Equation
One can solve the LLS problem Ax “=” b by solving the normal equation ATAx “ ATb directly as below.
1 Compute N “ ATA.
2 Compute z “ ATb.
3 Solve the square linear system N x “ z for x.
Step 3 is done using chol which implements the Cholesky factorization. MATLAB Implementarion.
N = A’*A;
z = A’*b;
R = chol(N);
w = forelim(R’,z);
x = backsub(R,w);
% solve R’w = z
% solve R x = w
60/63
Conditioning of Normal Equations
‚ Recall that the condition number of solving a square linear system Ax “ b is ŸpAq “ }A} ›A ́1›.
‚ Provided that the residual norm at the least square solution is relatively small, the conditioning of LLS problem is similar:
ŸpAq “ }A} ›A`› .
‚ If A is rank-de cient (columns are linearly dependent), then ŸpAq “ 8.
‚ If an LLS problem is solved solving the normal equation, it can be shown that the condition number is
ŸpATAq “ ŸpAq2.
61/63
Which Re ector Is Better?
‚ Recall:
H“I ́2vTv, wherev“ ̆}z}2e1 ́z,
vvT ‚ In myqr.m, the statement
v = z + sign0(z(1))*norm(z)*eye(length(z), 1); de nes v slightly di erently4, namely,
v “ z ̆ }z}2 e1.
4This does not cause any di erence since H is invariant under scaling of v; see p. 50
62/63
Which Re ector Is Better? (cont’)
The sign of ̆ }z}2 is chosen so as to avoid possible catastrophic cancellation in
forming v: »— z1 fi »— ̆ }z}2 fi »—z1 ̆ }z}2 fi v “ — — z 2 ` — — 0 “ — — z 2 —– . . . fl —– . . . fl —– . . . fl
zm 0 zm Subtractive cancellation may arise when z1 « ̆ }z}2 .
‚ ifz1 °0,usez1 `}z}2; ‚ ifz1 †0,usez1 ́}z}2; ‚ if z1 “ 0, either works.
For numerical stability, it is desirable to re ect z to the vector s }z}2 e1 that is not too close to z itself. (Trefethen & Bau)
63/63