Scientific Computing
Lecturer: Qi Deng
Lecture 5: QR Decomposition
Spring 2020
Scribes: Qi Deng
Note: LaTeX template courtesy of UC Berkeley EECS dept.
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.
5.1 Least squares problems
Earlier we discussed linear system Ax = b with square matrix A. If m > n, we shouldn’t expect that the solution is unique or even exist. It is tempting to find a solution with minimal residual r = Ax − b. If we use square loss as the penalty function, we come to the least squares problem:
min f (x) = 1 ∥Ax − b∥2 . (5.1) 2
Taking the derive of f(x) and setting ∇f(x) = 0, we have the so-called normal equation:
AT Ax = AT b (5.2)
Since A has full column rank, AT A is an SPD matrix. We can solve (5.2) in O(n3). The earlier analysis shows that backward error is of order of
O(κ(AT A)) = O κ(A)2 .
The square dependence on condition number is unsatisfactory, we show a more stable algorithm based on
the so-called QR factorization.
5.2 QR decomposition
Theorem 1 (QR). Let A ∈ Rm×n be a matrix (m ≥ n). Then there exists a matrix Q ∈ Rm×m with orthonormal columns and a upper trapezoidal matrix R ∈ Rm×n such that
A = [q1, q2, …, qm] ·
0 rnn 0
(5.3)
r11 r1n 0 r22 r2n
.. .. . …
.
5-1
. 0…0
5-2 Lecture 5: QR Decomposition
The proof of this theorem easily presents itself in the so-called Gram-Schmidt orthogonalization described in Algorithm 1. Since the multipliers of qn+1,qn+2,…,qm are always zero, from (5.3) reduces to thin QR decomposition
r11
0 r22 A=[q1,q2,…,qn]· . .. ..
r1n
r2n
. =QR.
(5.4)
…. 0 … 0 rnn
Throughout the rest of this section, we will always use A = QR to denote the thin QR factorization for simplicity.
Algorithm 1: Gram-Schmidt
Input: n independent vectors a1, a2, . . . , an ∈ Rm;
r11 = ∥a1∥2, v1 = a1, q1 = a1/r11; for k=2,3,…,n do
rik =qiTak,i=1,2,…,k−1; vk = ak − k−1 qirik;
i=1
rkk = ∥vk∥2 and qk = vk/rkk;
end
Output: q1, q2, …,qn.
Let us look at the k-step in a more compact way, it precisely performs right multiplication by a upper triangular matrix:
⇒ [q1, q2, …, qk−1, qk, ak+1, …, an] ,
−rikr−1 i ̸= k = kk
1 r
1k
… .
[q1, q2, …, qk−1, ak, …, an] · rkk ..
. 1
kk
AR1R2…Rn ⇒ Q, QT Q = I.
Despite its simplicity, Gram-Schmidt process is not a stable algorithm. Since we have no control of the intermediate result Rk, GS can quickly loss orthogonality among {qk}. The modified Gram-Schmidt im- proves the numerical performance by reordering the projection and subtraction. It computes a sequence of intermediate vectors
. Note that r > 0 because of the independence assumption. Therefore, GS multiplies a series of upper triangular matrices to make the columns of A orthogonal:
where r
ik r−1 i=k kk
v(1) =ak, k
v(i+1) = v(i) − qiqiT v(i), kkk
vk = v(k). k
i < k
Alternatively, we can apply orthogonal transform to convert A into upper triangular form: Qn ·Qn−1 ·...Q1A⇒R.
(5.5)
Lecture 5: QR Decomposition 5-3
We recursively define Rk = QkRk−1 and R0 = A, Rn = R. Here the Qk is an orthogonal matrix that eliminate the zeros below diagonal of the k-th column while preserving zero in preceding columns of Rk−1:
rk−1 ... rk−1 k k 11 1n r11 ... r1n
.. . 0. .. 0
0rk−1 k
kk 0r
... kk ... ..
Qk ...= .... 0.
..
. . k−1 k−1 . .
. . r r . . k nk nn .. r
nn .
.
0 0 rk−1 rk−1
0 ... 0 ... rmkn
So what is great about orthogonal transform? Recall that Q is orthogonal if QT Q = I = QQT . Moreover,
we can show that
∥QA∥2 = sup ∥QAx∥2 = sup ∥Ax∥2 = ∥A∥2, x ∥x∥2 x ∥x∥2
mk mn
implying that κ2(A) = κ2(QA). Therefore, in (5.5) we have κ2(R) = κ2(A), meaning that the orthogonal transform is a stable process.
Our goal is to find orthogonal transform that turns the initial matrix into a upper triangular matrix. To this end, we introduce the Householder reflection: for any vector v, a Householder reflection is defined by
2vvT P = I − ∥v∥2 .
2
One can immediately check that P is indeed an orthogonal matrix: PPT = I. Given any x, y = Px is the reflection of x with respect to the hyperplane orthogonal to v. See Figure 5.1.
Figure 5.1: Houlseholder reflection from x to y across the hyperplane span{v}⊥.
For QR factorization, we find a particular v vector to convert all but the first entry of a vector x to zero while preserving the vector length:
±∥x∥2
0
P1x = . = ±∥x∥2e1.
. 0
5-4 Lecture 5: QR Decomposition
One can quickly check that v = x ± ∥x∥2ei. For example, we can set v = sign(x1)∥x∥2e1 + x.
We can show the existence of Householder QR factorization by induction. We choose a Householder matrix
P1 such that
α wT P1A= 0m−1 B
where B is an m − 1 × n − 1 matrix. By induction hypothesis, B = QˆRˆ, therefore, we have α w T 1 0 Tn − 1 α w T
P 1 A = 0 m − 1 Qˆ Rˆ = 0 m − 1 Qˆ 0 m − 1 Rˆ . T1 0Tn−1α wT
We give an example implementation of QR factorization in Code 1. We also note that if Q factor is not needed, we can explicitly maintain {vk} in the lower triangular part while keeping R in the upper triangular part of A.
Example code 1: QR
In other words, we have
A=P1 0m−1 Qˆ 0m−1 Rˆ .
T1 0Tn−1 α wT
is upper trapezoidal.
It can be checked that Q = P1 0m−1 Qˆ is orthogonal and R = 0m−1 Rˆ
function [Q,R] = qrfact(A)
%
%
% Input:
% A m-by-n matrix % Output:
% Q,R A=QR, Q m-by-m orthogonal, R m-by-n upper triangular [m,n] = size(A);
Q = eye(m);
QRFACT QR factorization by Householder reflections. (demo only--not efficient)
for k =
1:n
z = A(k:m,k);
v = [ -sign(z(1))*norm(z) - z(1); -z(2:end) ]; nrmv = norm(v);
if nrmv < eps , continue , end
iteration
% nothing is done in this
v = v / nrmv;
% removes v'*v in other the reflection to each relevant column of A and Q
1:n
A(k:m,j) = A(k:m,j) - v*( 2*(v'*A(k:m,j)) );
1:m
Q(k:m,j) = Q(k:m,j) - v*( 2*(v'*Q(k:m,j)) );
formulas % Apply
for
end for
end
R =
j =
j =
end
Q = Q';
triu(A); % enforce exact triangularity
1 2 3 4 5 6 7 8 9
10 11 12 13
14
15
16
17
18
19
20
21
22
23
Lecture 5: QR Decomposition 5-5
Algorithm stability Finally, we note that modified Gram-schmidt1 produces a Q satisfying QTQ=I+EM, ∥EM∥2 ≈εMκ2(A).
while the Householder QR produces a Q satisfying
QTQ=I+EH, ∥EH∥2 ≈εM.
5.2.1 QR for least squares
We immediately obtain a solution of Problem (5.1) via QR. Let A = QR. Plugging AT A = RT QT QR = RT R in the normal equation (5.2) we have RT Rx = RT Qb. Since A has full column rank and R is invertible, it follows that
Rx = Qb. (5.6)
The above triangular system can be readily solved by backward substitution in O(n2) flops. Informally, the error of (5.6) is order O (κ(R)nεM ), which can be significantly better than O κ(AT A)nεM of Cholesky. As a result, it is natural to use QR rather than Cholesky factorization for linear systems.
We now state backward error of least squares problems based on Householder QR. If the computed solution
x solves we have
min ∥(A + δA)x − (b + δb)∥2, x
∥δA∥F (6m−3n+41)εM∥A∥F +O(ε2M), ∥δb∥2 (6m − 3n + 40)εM ∥b∥2 + O(ε2M ).
Here x y means x ≤ c · y for some constant c > 0.
While QR is numerically stable, it is not scalable to certain high-dimensional and large-scale problems. Alternative such as Cholesky (LDL) is more favorable for large sparse problem arising from applications such as interior point methods. The main reasons are twofold.
1. Sparsity. QR promotes less sparsity and requires more memory for storage (both Q and R). The orthogonal basis will quickly grow to dense vectors. Reordering the columns to improve the sparsity is possible but more challenging. In contrast, Cholesky (LDL) factorization often needs to store a smaller factor L. By using the pivoting rule one needs to maintain significantly less nonzeros.
2. Efficiency: The best implementation of QR takes O 2n2m − 2 n3 while Cholesky factorization takes 3
O(1n3 +n2m+n2). Empirical study also confirms that Cholesky factorization is faster than QR and 3
the performance gap is even larger.
1Modified Gram-Schmidt improves upon classic Gram-Schmidt by reordering the projection and subtraction operations, but it is still problematic.