CS计算机代考程序代写 matlab algorithm Overdetermined Linear Systems

Overdetermined Linear Systems
1/63

Contents
1 Introduction
Opening Example: Polynomial Approximation The Normal Equations
Appendix: Derivation of Normal Equation
2 QR Factorization
Preliminary: Orthogonality
QR Factorization
Appendix: Gram-Schmidt Orthogonalization
3 QR Algorithm
Revisiting Least Squares
Householder Transformation and QR Algorithm Supplementary 1: Projection and Reection Supplementary 2: Conditioning and Stability
2/63

Introduction
3/63

Opening Example: Polynomial Approximation
4/63

Introduction
Problem: Fitting Functions to Data
Given data points tpxi , yi q | i P Nr1, msu, pick a form for the “tting” function f pxq and minimize its total error in representing the data.
‚ With real-world data, interpolation is often not the best method.
‚ Instead of nding functions lying exactly on given data points, we look for
ones which are “close” to them.
‚ In the most general terms, the tting function takes the form
f pxq “ c1 f1 pxq ` ̈ ̈ ̈ ` cn fn pxq,
where f1,…,fn are known functions while c1,…,cn are to be determined.
5/63

Linear Least Squares Approximation
In this discussion:
‚ use a polynomial tting function ppxq “ c1 ` c2x ` ̈ ̈ ̈ ` cnxn ́1 with
n † m;
‚ minimize the 2-norm of the error ri “ yi ́ ppxiq:
gf f ÿm gf f ÿm ` ̆ 2 }r}2“e ri2“e yi ́ppxiq.
i“1 i“1
Since the tting function is linear in unknown coecients and the 2-norm is minimized, this method of approximation is called the linear least squares (LLS) approximation.
6/63

Example: Temperature Anomaly
Below are 5-year averages of the worldwide temperature anomaly as compared to the 1951-1980 average (source: NASA).
Year
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
Anomaly ( ̋C) -0.0480 -0.0180 -0.0360 -0.0120 -0.0040 0.1180 0.2100 0.3320 0.3340 0.4560
7/63

Example: Import and Plot Data
t = (1955:5:2000)’;
y = [-0.0480; -0.0180;
-0.0360; -0.0120;
-0.0040; 0.1180;
0.2100; 0.3320;
0.3340; 0.4560];
plot(t, y, ’.’)
0.5
0.4
0.3
0.2
0.1
0
-0.1
1955 1960 1965
1970 1975
1980 1985
1990 1995 2000
year
8/63
anomaly (°C)

Example: Interpolation
t = (t-1950)/10;
n = length(t);
V = t.^(0:n-1);
c = V\y;
p = @(x) polyval(flip(c),
(x-1950)/10);
hold on
fplot(p, [1955 2000])
0.5
0.4
0.3
0.2
0.1
0
-0.1
1955 1960 1965
1970 1975
1980 1985
1990 1995 2000
year
9/63
anomaly (°C)

Fitting by a Straight Line
Suppose that we are tting data to a linear polynomial: ppxq “ c1 ` c2x. ‚ If it were to pass through all data points:
$’y1“ppx1q“c1`c2x1 »y1fi »1 x1fi
’& y2 “ ppx2q “ c1 ` c2x2 matrix equation — y2 —1
’. . . ›››››››››Ñ—.“—..c
… 2
’
’ ’ ’% y “ p p x q “ c ` c x
y m 1 x m c lomon loooomoooon
m m 1 2 m
‚ The above is unsolvable; instead, nd c which makes the residual
r “ y ́ V c “as small as possible” in the sense of vector 2-norm. ‚ Notation:y“=”Vc
x2 „c1⇢ – . fl –. . fllomon
yV
10/63

MATLAB Implementation
Revisiting the temperature anomaly example again:
0.5
year = (1955:5:2000)’;
t = year – 1955; 0.4 V = t.^(0:1);
c = V\y;
p = @(x) polyval(flip(c),
0.2
x-1955);
plot(year, y, ’.’) 0.1 hold on
fplot(p, [1955, 2000])
-0.1
-0.2
1955 1960 1965
0.3
0
1970 1975
1980 1985
1990 1995 2000
year
11/63
anomaly (°C)

Fitting by a General Polynomial
In general, when tting data to a polynomial
we need to solve
normal equation
V TV c “ V Ty .
ppxq“c1 `c2x`c3x2 ` ̈ ̈ ̈`cnxn ́1, »y1 fi »1 x1 x2 ̈ ̈ ̈ xn ́1fi »c1fi
——11— —y2 —1 x2 x2 ̈ ̈ ̈ xn ́1 —c2 — “=”— 2 2 — . —– . . . fl —– . . . . . . . . . . . . . . . fl —– . . . fl
ym 1 xm x2 ̈ ̈ ̈ xn ́1 cn mm
lomon looooooooooooooooomooooooooooooooooon lomon
yVc
‚ The solution c of y “=” V c turns out to be the solution of the
12/63

MATLAB Implementation
Revisiting the temperature anomaly example again:
0.5
V = t.^(0:3);
c = V\y; 0.4 q = @(x) polyval(flip(c),
x-1955);
hold on
fplot(q, [1955 2000])
0.3
0.2
0.1
0
-0.1
-0.2
1955 1960 1965
1970 1975
1980 1985
1990 1995 2000
year
13/63
anomaly (°C)

Backslash Again
The Versatile Backslash
InMATLAB,thegenericlinearequationAx“bissolvedbyx = A\b. ‚ When A is a square matrix, Gaussian elimination is used.
‚ When A is NOT a square matrix, the normal equation ATAx “ ATb is solved instead.
‚ AslongasAPRmˆn wherem•nhasrankn,thesquarematrixATAis nonsingular. (unique solution)
‚ Though ATA is a square matrix, MATLAB does not use Gaussian elimination to solve the normal equation.
‚ Rather, a faster and more accurate algorithm is used.
14/63

The Normal Equations
15/63

LLS and Normal Equation
Big Question: How is the least square solution x to Ax “=” b equivalent to the solution of the normal equation ATAx “ ATb?
Theorem (Normal Equation)
LetAPRmˆn withm•n.IfxPRn satisesATAx“ATb,thenxsolvesthe LLS problems, i.e., x minimizes }b ́ Ax}2.
16/63

Proof of the Theorem
Idea of Proof1. Enough show to that ›b ́ Apx ` yq›2 • }b ́ Ax}2 for any y P Rn.
‚ Useful algebra:
pu ` vqTpu ` vq “ uTu ` uTv ` vTv ` vTv “ uTu ` 2vTu ` vTv.
‚ Exercise: Prove it.
1Alternately, one can derive the normal equation using calculus. See Appendix.
17/63

Appendix: Derivation of Normal Equation
18/63

Derivation of Normal Equation
Consider Ax “=” b where A P Rmˆn where m • n.
‚ Requirement: minimize the 2-norm of the residual r “ b ́ Ax:
2ÿm ́ÿn ̄2 gpx1,x2,…,xnq :“ }r}2 “ bi ́ aijxj .
i“1 j“1 ‚ Strategy: using calculus, nd the minimum by setting
0 “ Ògpx1,x2,…,xnq which yields n equations in n unknowns x1, x2, . . . , xn.
19/63

Derivation of Normal Equation (cont’)
Noting that Bxj{Bxk “ ”j,k, the n equations Bg{Bxk “ 0 are written out as
ÿm ` ÿn ̆
0 “ 2 bi ́ aijxj p ́aikq,
i“1 j“1 which can be rearranged into
for k P Nr1,ns, for k P Nr1,ns.
ÿm i“1
ÿm ÿn i“1 j“1
aikbi “
One can see that the two sides correspond to the kth elements of ATb and
ATAx respectively:
showing the desired equivalence.
aijaikxj,
ATAx “ ATb,
20/63

QR Factorization
21/63

Preliminary: Orthogonality
22/63

Normal Equation Revisited
Alternate perspective on the “normal equation”:
ATpb ́Axq“0 ñ zTp b ́Ax q“0 forallzPRpAq,
i.e., x solves the normal equation if and only if the residual is orthogonal to the range of A.
residual “ r
b
lo omo on
by y = Ax
range(A)
23/63

Orthogonal Vectors
Recall that the angle ◊ between two vectors u, v P Rn satises cosp◊q“ uTv .
Denition 1
‚ Two vectors u, v P Rn are orthogonal if uTv “ 0.
‚ Vectors q1,q2,…,qk P Rn are orthogonal if qiTqj “ 0 for all i ‰ j. ‚ Vectors q1,q2,…,qk P Rn are orthonormal if qiTqj “ ”i,j.
Notation. (Kronecker delta function)
”i,j“#1, i“j 0, i‰j
}u}2 }v}2
24/63

Matrices with Orthogonal Columns
LetQ““q1 |q2 | ̈ ̈ ̈|qk ‰PRnˆk.Notethat
q
»— q 1T fi » QTQ “ — q2T –q
fi »— q 1T q 1 ̈ ̈ ̈ q fl “ —q2Tq1
q 1T q 2 q2Tq2
.
q kT q 2
̈ ̈ ̈ ̈ ̈ ̈
…
̈ ̈ ̈
q 1T q k fi q2Tqk.
— . 1 – q kT fl
2
k —– .
q kT q 1
. fl q kT q k
Therefore,
‚ q1,…,qk areorthogonal.
q1, . . . , qk are orthonormal.
ñ QTQisakˆkdiagonalmatrix. ñ QTQ is the k ˆ k identity matrix.
25/63

Matrices with Orthonormal Columns
Theorem 2
LetQ““q1 |q2 | ̈ ̈ ̈|qk ‰PRnˆk andsupposethatq1,…,qk are orthonormal. Then
1 QTQ“IPRkˆk;
2 }Qx}2 “}x}2 forallxPRk; 3 }Q}2“1.
26/63

Orthogonal Matrices
Denition 3
We say that Q P Rnˆn is an orthogonal matrix if QTQ “ I P Rnˆn.
‚ A square matrix with orthogonal columns is not, in general, an orthogonal matrix!
27/63

Properties of Orthogonal Matrices
Theorem 4
Let Q P Rnˆn be orthogonal. Then
1 Q ́1“QT;
2 QT is also an orthogonal matrix;
3 Ÿ2 pQq “ 1;
4 For any A P Rnˆn, }AQ}2 “ }A}2;
5 if P P Rnˆn is another orthogonal matrix, then P Q is also orthogonal.
28/63

Why Do We Like Orthogonal Vectors?
‚ If u and v are orthogonal, then }u ̆v}2 “
‚ Without orthogonality, it is possible that }u ́ v}2 is much smaller than }u}2 and }v}2.
‚ The addition and subtraction of orthogonal vectors are guaranteed to be well-conditioned.
vvu ́v u ́v
u
u
29/63

QR Factorization
30/63

The QR Factorization
The following matrix factorization plays a role in solving linear least squares problems similar to that of LU factorization in solving linear systems.
Theorem 5
LetAPRmˆn wherem•n.ThenA“QRwhereQPRmˆm isorthogonaland R P Rmˆn is upper triangular.
» fi » — — — —
»— r 1 1 r 1 2 ̈ ̈ ̈ r 1 n fi fi—0 r22 ̈ ̈ ̈ r2n
—. … .
̈ ̈ ̈ rnn fl—0 0 ̈ ̈ ̈ 0
qm— 0 0 looooooooooooomooooooooooooon looooooooooooomooooooooooooon —– . . . fl
—a1 a2 ̈ ̈ ̈ an “ —q1
–
fl –
̈ ̈ ̈
q2
A Q 00 ̈ ̈ ̈0
loooooooooooooomoooooooooooooon
R
31/63

Thick VS Thin QR Factorization
‚ HereistheQRfactorizationagain.
»
–
‚ When m is much larger than n, it is much more ecient to use the thin or
compressed QR factorization.
— — A“ — — — q 1
»—r11 r12 ̈ ̈ ̈ r1nfi fi—0 r22 ̈ ̈ ̈ r2n
— — . . . . . . . . .
̈ ̈ ̈ r n n (thick) fl—0 0 ̈ ̈ ̈ 0
Q 00 ̈ ̈ ̈0 loooooooooooooomoooooooooooooon
»—
A “ —q1 —–
q2
fi »— r 1 1 r 1 2 ̈ ̈ ̈ r 1 n fi
qn— 0 r22 ̈ ̈ ̈ r2n (thin)
q m — — — 0 0 looooooooooooomooooooooooooon—– . . . fl
q 2
̈ ̈ ̈
̈ ̈ ̈
fl —– . . . . . . . . . fl 0 0 ̈ ̈ ̈ rnn
looooooooooooomooooooooooooon loooooooooooooomoooooooooooooon Qp Rp
32/63
R

QR Factorization in MATLAB
Either type of QR factorization is computed by qr command. ‚ Thick/Full QR factorization
[Q, R] = qr(A)
‚ Thin/Compressed QR factorization
[Q, R] = qr(A, 0)
Test the orthogonality of Q by calculating the norm of QTQ ́ I where I is the
identity matrix with suitable dimensions.
norm(Q’*Q – eye(m)) % full QR
norm(Q’*Q – eye(n)) % thin QR
33/63

Least Squares and QR Factorization
Substitute the thin factorization A “ QpRp into the normal equation ATAx “ ATb and simplify.
34/63

Least Squares and QR Factorization (cont’)
Summary: Algorithm for LLS Approximation
If A has rank n, the normal equation ATAx “ ATb is consistent and is equivalent to Rpx “ QpTb.
1 Factor A “ QpRp.
2 Letz“QpTb.
3 Solve Rpx “ z for x using backward substitution.
35/63

Least Squares and QR Factorization (cont’)
function x = lsqrfact(A,b)
% LSQRFACT x = lsqrfact(A,b)
% Sove linear least squares by QR factorization
% Input:
% A coefficient matrix (m-by-n, m>n)
% b right-hand side (m-by-1)
% Output:
% x minimizer of || b – Ax || (2-norm)
[Q,R] = qr(A,0); % thin QR fact.
z = Q’*b;
x = backsub(R,c);
end
36/63

Appendix: Gram-Schmidt Orthogonalization
37/63

The Gram–Schmidt Procedure
Problem: Orthogonalization
Given a1,…,an P Rm, construct orthonormal vectors q1,…,qn P Rm such that spanta1,…,aku “ spantq1,…,qku, for any k P Nr1,ns.
‚ Strategy. At the jth step, nd a unit vector qj P spanta1,…,aju that is orthogonal to q1, . . . , qj ́1.
‚ Key Observation. The vector vj dened by
v “a ́ ́qTa ̄q ́ ́qTa ̄q ́ ̈ ̈ ̈ ́ ́qT a ̄q
j j 1j 1 2j 2 j ́1j j ́1 satises the required properties.
38/63

GS Algorithm
The Gram–Schmidt iteration is outlined below:
q1 “ a1 , r11
q2 “ a2 ́ r12q1 , r22
q3 “ a3 ́r13q1 ́r23q2,
r33 .
a ́∞n ́1r q q “ n i“1 in i,
n
rnn
where
$’&’qiTaj, if i ‰ j › j ́1 ›
rij“’ ̆›aj ́ÿrkjqk›, ifi“j. % k“1 2
39/63

GS Procedure and Thin QR Factorization
‚ The GS algorithm, written as a matrix equation, yields a thin QR factorization:
»— fi »— fi »r11 r12 ̈ ̈ ̈ r1n fi
a “ —q ̈ ̈ ̈ q — 0 r22 ̈ ̈ ̈ r2n —1 n —1 n—–. … .fl
A “ —a ̈ ̈ ̈
— — — — — — 0 0 ̈ ̈ ̈ rnn
“ QpRp
—– fl —– fl loooooooooooooomoooooooooooooon
Rp looooooooomooooooooon looooooooomooooooooon
A Qp
‚ While it is an important tool for theoretical work, the GS algorithm is numerically unstable.
40/63

QR Algorithm
41/63

Revisiting Least Squares
42/63

Moore-Penrose Pseudoinverse
Let A P Rmˆn with m • n and suppose that columns of A are linearly independent.
‚ The least square problem Ax “=” b is equivalent to the normal equation ATAx “ ATb, which is a square matrix equation.
‚ The solution can be written as
x “ ́ATA ̄ ́1 ATb.
A` “ ́ATA ̄ ́1 AT P Rnˆm,
is called the (Moore-Penrose) pseudoinverse.
‚ MATLAB’s backslash is mathematically equivalent to left-multiplication by
the inverse or pseudoinverse of a matrix.
‚ MATLAB’s pinv calculates the pseudoinverse, but it is rarely used in
practice, just as inv.
‚ The matrix
43/63

Moore-Penrose Pseudoinverse (cont’)
‚ A` can be calculated by using the thin QR factorization2 A “ QpRp. A ` “ Rp ́ 1 Qp T .
2It can be done using the thick QR factorization as seen on p. 1624 of the text.
44/63

Least Squares and QR Factorization
Substitute the thin factorization A “ QpRp into the normal equation ATAx “ ATb and simplify.
45/63

Householder Transformation and QR Algorithm
48/63

Motivation
Problem
Given z P Rm, nd an orthogonal matrix H P Rmˆm such that Hz is nonzero only in the rst element.
‚ Since orthogonal matrices preserve the 2-norm, H must satisfy »— ̆ } z } 2 fi
Hz“— 0. “ ̆}z} e1. —– 0. fl 2
‚ The Householder transformation matrix H dened by vvT
H“I ́2vTv, wherev“ ̆}z}2e1 ́z, solves the problem. See Theorem 6 on the next slide.
49/63

Properties of Householder Transformation
Theorem 6
Let v “ }z}2 e1 ́ z and let H be the Householder transformation dened by vvT
Then
1 H is symmetric;
2 H is orthogonal;
3 Hz “ }z}2 e1.
‚ H is invariant under scaling of v. ‚ If }v}2 “ 1, then H “ I ́ vvT.
H “ I ́ 2vTv.
50/63

Geometry Behind Householder Transformation (cont’)
The Householder transformation matrix H can be thought of as a reector3.
O
z
✓
“ mirror ”
Hz
3See Supplementary 1 for review on projection and reection operators
w
v
51/63

Factorization Algorithm
‚ The Gram-Schmidt orthogonalization (thin QR factorization) is unstable in oating-point calculations.
‚ Stable alternative: Find orthogonal matrices H1 , H2 , . . . , Hn so that HnHn ́1 ̈ ̈ ̈H2H1 A “ R.
loooooooooomoooooooooon
“:QT
introducing zeros one column at a time below diagonal terms.
‚ As a product of orthogonal matrices, QT is also orthogonal and so
pQTq ́1 “ Q. Therefore,
A “ QR .
52/63

MATLAB Demonstration Code MYQR
function [Q, R] = myqr(A)
[m, n] = size(A);
A0 = A;
Q = eye(m);
for j = 1:min(m,n)
Aj = A(j:m, j:n);
z = Aj(:, 1);
v = z + sign0(z(1))*norm(z)*eye(length(z), 1);
Hj = eye(length(v)) – 2/(v’*v) * v*v’;
Aj = Hj*Aj;
H = eye(m);
H(j:m, j:m) = Hj;
Q = Q*H;
A(j:m, j:n) = Aj;
end
R = A;
end
53/63

MATLAB Demonstration Code MYQR (cont’)
(continued from the previous page)
% local function
function sign0(x)
y = ones(size(x));
y(x < 0) = -1; end ‚ The MATLAB command qr works similar to, but more eciently than, this. ‚ The function nds the factorization in „ p2mn2 ́ n3{3q ops asymptotically. 54/63 Supplementary 1: Projection and Reection 55/63 Projection and Reection Operators Let u, v P Rm be nonzero vectors. ‚ Projection of u onto xvy “ spanpvq: vTuv “ ̃vvT ̧u “: Pu. vTv vTv “:P ‚ Projection of u onto xvyK, the orthogonal complement of xvy: u ́ vT u v “ ̃I ́ vvT ̧ u “: pI ́ P qu. vTv vTv ‚ Reection of u across xvyK: u ́ 2 vT u v “ ̃I ́ 2 vvT ̧ u “: pI ́ 2P qu. vTv vTv looomooon 56/63 Projection and Reection Operators (cont’) Summary: for given v P Rm, a nonzero vector, let P“vvT PRmˆm. vTv Then the following matrices carry out geometric transformations ‚ Projection onto xvy: P ‚ Projection onto xvy: I ́ P ‚ Reection across xvyK: I ́ 2P Note. If v were a unit vector, the denition of P simplies to P “ vvT. 57/63 Supplementary 2: Conditioning and Stability 58/63 Analytical Properties of Pseudoinverse The matrix ATA appearing in the denition of A` satises the following properties. Theorem 7 For any A P Rmˆn with m • n, the following are true: 1 ATA is symmetric. 2 ATA is singular if and only if rankpAq † n. 3 If ATA is nonsingular, then it is positive denite. A symmetric positive denite (SPD) matrix S such as ATA permits so-called the Cholesky factorization S “ RTR where R is an upper triangular matrix. 59/63 Least Squares Using Normal Equation One can solve the LLS problem Ax “=” b by solving the normal equation ATAx “ ATb directly as below. 1 Compute N “ ATA. 2 Compute z “ ATb. 3 Solve the square linear system N x “ z for x. Step 3 is done using chol which implements the Cholesky factorization. MATLAB Implementarion. N = A’*A; z = A’*b; R = chol(N); w = forelim(R’,z); x = backsub(R,w); % solve R’w = z % solve R x = w 60/63 Conditioning of Normal Equations ‚ Recall that the condition number of solving a square linear system Ax “ b is ŸpAq “ }A} ›A ́1›. ‚ Provided that the residual norm at the least square solution is relatively small, the conditioning of LLS problem is similar: ŸpAq “ }A} ›A`› . ‚ If A is rank-decient (columns are linearly dependent), then ŸpAq “ 8. ‚ If an LLS problem is solved solving the normal equation, it can be shown that the condition number is ŸpATAq “ ŸpAq2. 61/63 Which Reector Is Better? ‚ Recall: H“I ́2vTv, wherev“ ̆}z}2e1 ́z, vvT ‚ In myqr.m, the statement v = z + sign0(z(1))*norm(z)*eye(length(z), 1); denes v slightly dierently4, namely, v “ z ̆ }z}2 e1. 4This does not cause any dierence since H is invariant under scaling of v; see p. 50 62/63 Which Reector Is Better? (cont’) The sign of ̆ }z}2 is chosen so as to avoid possible catastrophic cancellation in forming v: »— z1 fi »— ̆ }z}2 fi »—z1 ̆ }z}2 fi v “ — — z 2 ` — — 0 “ — — z 2 —– . . . fl —– . . . fl —– . . . fl zm 0 zm Subtractive cancellation may arise when z1 « ̆ }z}2 . ‚ ifz1 °0,usez1 `}z}2; ‚ ifz1 †0,usez1 ́}z}2; ‚ if z1 “ 0, either works. For numerical stability, it is desirable to reect z to the vector s }z}2 e1 that is not too close to z itself. (Trefethen & Bau) 63/63

Related Posts