程序代写代做 algorithm C Scientific Computing

Scientific Computing
Lecturer: Qi Deng
Lecture 5: QR Decomposition
Spring 2020
Scribes: Qi Deng
Note: LaTeX template courtesy of UC Berkeley EECS dept.
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.
5.1 Least squares problems
Earlier we discussed linear system Ax = b with square matrix A. If m > n, we shouldn’t expect that the solution is unique or even exist. It is tempting to find a solution with minimal residual r = Ax − b. If we use square loss as the penalty function, we come to the least squares problem:
min f (x) = 1 ∥Ax − b∥2 . (5.1) 2
Taking the derive of f(x) and setting ∇f(x) = 0, we have the so-called normal equation:
AT Ax = AT b (5.2)
Since A has full column rank, AT A is an SPD matrix. We can solve (5.2) in O(n3). The earlier analysis shows that backward error is of order of
O(κ(AT A)) = O 􏰀κ(A)2􏰁 .
The square dependence on condition number is unsatisfactory, we show a more stable algorithm based on
the so-called QR factorization.
5.2 QR decomposition
Theorem 1 (QR). Let A ∈ Rm×n be a matrix (m ≥ n). Then there exists a matrix Q ∈ Rm×m with orthonormal columns and a upper trapezoidal matrix R ∈ Rm×n such that
A = [q1, q2, …, qm] ·  
0 rnn 0 
(5.3)
r11 r1n  0 r22 r2n
 .. .. . …
 .
5-1
.  0…0

5-2 Lecture 5: QR Decomposition
The proof of this theorem easily presents itself in the so-called Gram-Schmidt orthogonalization described in Algorithm 1. Since the multipliers of qn+1,qn+2,…,qm are always zero, from (5.3) reduces to thin QR decomposition
r11
0 r22 A=[q1,q2,…,qn]· . .. ..
r1n 
r2n
. =Q􏰆R􏰆.
(5.4)
…. 0 … 0 rnn
Throughout the rest of this section, we will always use A = QR to denote the thin QR factorization for simplicity.
Algorithm 1: Gram-Schmidt
Input: n independent vectors a1, a2, . . . , an ∈ Rm;
r11 = ∥a1∥2, v1 = a1, q1 = a1/r11; for k=2,3,…,n do
rik =qiTak,i=1,2,…,k−1; vk = ak − 􏰅k−1 qirik;
i=1
rkk = ∥vk∥2 and qk = vk/rkk;
end
Output: q1, q2, …,qn.
Let us look at the k-step in a more compact way, it precisely performs right multiplication by a upper triangular matrix:
⇒ [q1, q2, …, qk−1, qk, ak+1, …, an] ,
􏰄−rikr−1 i ̸= k = kk
 1 r􏰇
1k
 … . 
 
[q1, q2, …, qk−1, ak, …, an] ·  r􏰇kk   ..
. 1
kk
AR1R2…Rn ⇒ Q, QT Q = I.
Despite its simplicity, Gram-Schmidt process is not a stable algorithm. Since we have no control of the intermediate result Rk, GS can quickly loss orthogonality among {qk}. The modified Gram-Schmidt im- proves the numerical performance by reordering the projection and subtraction. It computes a sequence of intermediate vectors
. Note that r > 0 because of the independence assumption. Therefore, GS multiplies a series of upper triangular matrices to make the columns of A orthogonal:
where r􏰇
ik r−1 i=k kk
v(1) =ak, k
v(i+1) = v(i) − qiqiT v(i), kkk
vk = v(k). k
i < k Alternatively, we can apply orthogonal transform to convert A into upper triangular form: Qn ·Qn−1 ·...Q1A⇒R. (5.5) Lecture 5: QR Decomposition 5-3 We recursively define Rk = QkRk−1 and R0 = A, Rn = R. Here the Qk is an orthogonal matrix that eliminate the zeros below diagonal of the k-th column while preserving zero in preceding columns of Rk−1: rk−1 ... rk−1 k k  11 1n r11 ... r1n .. .  0. ..  0 0rk−1  k   kk  0r  ... kk  ... .. Qk ...= ....  0. .. . . k−1 k−1 . . . . r r  . . k   nk nn .. r  nn .  . 0 0 rk−1 rk−1 0 ... 0 ... rmkn So what is great about orthogonal transform? Recall that Q is orthogonal if QT Q = I = QQT . Moreover, we can show that ∥QA∥2 = sup ∥QAx∥2 = sup ∥Ax∥2 = ∥A∥2, x ∥x∥2 x ∥x∥2 mk mn  implying that κ2(A) = κ2(QA). Therefore, in (5.5) we have κ2(R) = κ2(A), meaning that the orthogonal transform is a stable process. Our goal is to find orthogonal transform that turns the initial matrix into a upper triangular matrix. To this end, we introduce the Householder reflection: for any vector v, a Householder reflection is defined by 2vvT P = I − ∥v∥2 . 2 One can immediately check that P is indeed an orthogonal matrix: PPT = I. Given any x, y = Px is the reflection of x with respect to the hyperplane orthogonal to v. See Figure 5.1. Figure 5.1: Houlseholder reflection from x to y across the hyperplane span{v}⊥. For QR factorization, we find a particular v vector to convert all but the first entry of a vector x to zero while preserving the vector length: ±∥x∥2  0 P1x =  .  = ±∥x∥2e1. . 0 5-4 Lecture 5: QR Decomposition One can quickly check that v = x ± ∥x∥2ei. For example, we can set v = sign(x1)∥x∥2e1 + x. We can show the existence of Householder QR factorization by induction. We choose a Householder matrix P1 such that 􏰂α wT􏰃 P1A= 0m−1 B where B is an m − 1 × n − 1 matrix. By induction hypothesis, B = QˆRˆ, therefore, we have 􏰂 α w T 􏰃 􏰂 1 0 Tn − 1 􏰃 􏰂 α w T 􏰃 P 1 A = 0 m − 1 Qˆ Rˆ = 0 m − 1 Qˆ 0 m − 1 Rˆ . T􏰂1 0Tn−1􏰃􏰂α wT􏰃 We give an example implementation of QR factorization in Code 1. We also note that if Q factor is not needed, we can explicitly maintain {vk} in the lower triangular part while keeping R in the upper triangular part of A. Example code 1: QR In other words, we have A=P1 0m−1 Qˆ 0m−1 Rˆ . T􏰂1 0Tn−1􏰃 􏰂α wT􏰃 is upper trapezoidal. It can be checked that Q = P1 0m−1 Qˆ is orthogonal and R = 0m−1 Rˆ function [Q,R] = qrfact(A) % % % Input: % A m-by-n matrix % Output: % Q,R A=QR, Q m-by-m orthogonal, R m-by-n upper triangular [m,n] = size(A); Q = eye(m); QRFACT QR factorization by Householder reflections. (demo only--not efficient) for k = 1:n z = A(k:m,k); v = [ -sign(z(1))*norm(z) - z(1); -z(2:end) ]; nrmv = norm(v); if nrmv < eps , continue , end iteration % nothing is done in this v = v / nrmv; % removes v'*v in other the reflection to each relevant column of A and Q 1:n A(k:m,j) = A(k:m,j) - v*( 2*(v'*A(k:m,j)) ); 1:m Q(k:m,j) = Q(k:m,j) - v*( 2*(v'*Q(k:m,j)) ); formulas % Apply for end for end R = j = j = end Q = Q'; triu(A); % enforce exact triangularity 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Lecture 5: QR Decomposition 5-5 Algorithm stability Finally, we note that modified Gram-schmidt1 produces a Q􏰆 satisfying Q􏰆TQ􏰆=I+EM, ∥EM∥2 ≈εMκ2(A). while the Householder QR produces a Q􏰇 satisfying Q􏰇TQ􏰇=I+EH, ∥EH∥2 ≈εM. 5.2.1 QR for least squares We immediately obtain a solution of Problem (5.1) via QR. Let A = QR. Plugging AT A = RT QT QR = RT R in the normal equation (5.2) we have RT Rx = RT Qb. Since A has full column rank and R is invertible, it follows that Rx = Qb. (5.6) The above triangular system can be readily solved by backward substitution in O(n2) flops. Informally, the error of (5.6) is order O (κ(R)nεM ), which can be significantly better than O 􏰀κ(AT A)nεM 􏰁 of Cholesky. As a result, it is natural to use QR rather than Cholesky factorization for linear systems. We now state backward error of least squares problems based on Householder QR. If the computed solution x􏰇 solves we have min ∥(A + δA)x − (b + δb)∥2, x ∥δA∥F 􏰈(6m−3n+41)εM∥A∥F +O(ε2M), ∥δb∥2 􏰈 (6m − 3n + 40)εM ∥b∥2 + O(ε2M ). Here x 􏰈 y means x ≤ c · y for some constant c > 0.
While QR is numerically stable, it is not scalable to certain high-dimensional and large-scale problems. Alternative such as Cholesky (LDL) is more favorable for large sparse problem arising from applications such as interior point methods. The main reasons are twofold.
1. Sparsity. QR promotes less sparsity and requires more memory for storage (both Q and R). The orthogonal basis will quickly grow to dense vectors. Reordering the columns to improve the sparsity is possible but more challenging. In contrast, Cholesky (LDL) factorization often needs to store a smaller factor L. By using the pivoting rule one needs to maintain significantly less nonzeros.
2. Efficiency: The best implementation of QR takes O 􏰀2n2m − 2 n3􏰁 while Cholesky factorization takes 3
O(1n3 +n2m+n2). Empirical study also confirms that Cholesky factorization is faster than QR and 3
the performance gap is even larger.
1Modified Gram-Schmidt improves upon classic Gram-Schmidt by reordering the projection and subtraction operations, but it is still problematic.