MATH50003 Numerical Analysis: Problem Sheet 4¶
This problem sheet explores least squares, the QR decomposition including for
tridiagonal matrices,
Copyright By PowCoder代写 加微信 powcoder
and the PLU decompositions.
Questions marked with a ⋆ are meant to be completed without using a computer.
Problems are denoted A/B/C to indicate their difficulty.
using LinearAlgebra, Plots, Test
1. Least squares and QR decompositions¶
Problem 1.1 (B) Find and plot the best least squares fit of ${1 \over 5x^2 + 1}$ by degree $n$
polynomials for $n = 0,\ldots,10$ at 1000 evenly spaced points between $0$ and $1$.
x = range(0, 1; length=1000)
pl = plot()
f = 1 ./ (5x.^2 .+ 1)
for n = 0:10
# Creates 1000 x n+1 matrix with entries x[k]^(j-1)
A = x .^ (0:n)’
plot!(x, A*c)
# Here is the same approach as above but this time explicitly using QR as discussed in the lecture notes:
x = range(0, 1; length=1000)
pl = plot()
f = 1 ./ (5x.^2 .+ 1)
for n = 0:10
# Creates 1000 x n+1 matrix with entries x[k]^(j-1)
A = x .^ (0:n)’
(Q, R) = qr(A)
c = R \ Q[:, 1:n+1]’ * f
plot!(x, A * c)
Problem 1.2⋆ (B) Show that every matrix has a QR decomposition such that the diagonal of $R$ is non-negative.
Make sure to include the case of more columns than rows.
Beginning with the square case, a square matrix $A=QR$ with square orthogonal $Q$ and square upper triangular $R$ can always be rewritten in the form $A=QD^{-1}DR$ where $D$ is a diagonal matrix with $sign(R[j,j])$ on the diagonal. As a result, $DR$ is an upper triangular matrix with positive diagonal. It remains to check that $QD^{-1}=-QD$ is still orthogonal – this is easy to check since $$-QD(-QD)^T = QDDQ^T = QQ^T = I.$$
Note we have made use of the fact that the inverse of a diagonal matrix is diagonal, that any diagonal matrix satisfies $D^{T} = D$ and that $DD=I$ since $\text{sign}(R[j,j])^2 = 1$.
The same argument works for the non-square cases as long as we take care to consider the appropriate dimensions and pad with the identity matrix. Note that $Q$ is always a square matrix in the $QR$ decomposition. Assume the $R$ factor has more columns $m$ than rows $n$. Then a square $n \times n$ diagonal matrix with its $n$ diagonal entries being $\text{sign}(R[j,j]), j=1..n$ works with the same argument as above.
Finally, assume that $R$ has less colums $m$ than rows $n$. In this case the square $n \times n$ diagonal matrix with its first $m$ diagonal entries being $\text{sign}(R[j,j]), j=1..m$ and any remaining diagonal entries being $1$ works with the same argument as above.
Since the matrix $D$ is always square diagonal and orthogonal by construction, everything else for both of these cases is exactly as in the square case.
Here are some practical examples in Julia:
E = [-1 0 0 0; 0 1 0 0; 0 0 1 0; 0 0 0 1]
R = [-1 3; 0 2; 0 0; 0 0]
4×2 Matrix{Int64}:
4×2 Matrix{Int64}:
E = [-1 0 0 0; 0 1 0 0; 0 0 -1 0; 0 0 0 1]
R = [-1 3 7 8 2 1; 0 2 9 2 1 9; 0 0 -7 2 1 9; 0 0 0 7 5 2]
4×6 Matrix{Int64}:
-1 3 7 8 2 1
0 2 9 2 1 9
0 0 -7 2 1 9
0 0 0 7 5 2
4×6 Matrix{Int64}:
1 -3 -7 -8 -2 -1
0 2 9 2 1 9
0 0 7 -2 -1 -9
0 0 0 7 5 2
Problem 1.3⋆ (B) Show that the QR decomposition of a square invertible matrix is unique,
provided that the diagonal of $R$ is positive.
Assume there is a second decomposition also with positive diagonal
A = QR = \tilde{Q} \tilde{R}
Then we know
Q^⊤ \tilde{Q} = R \tilde{R}^{-1}
Note $Q^⊤ \tilde{Q}$ is orthogonal, and $R \tilde{R}^{-1}$ has positive eigenvalues (the diagonal), hence all $m$ eigenvalues of
$Q^⊤ \tilde{Q}$ are 1. This means that $Q^⊤ \tilde{Q} = I$ and hence $\tilde{Q} = Q$, which then implies $\tilde{R} = R$.∎
2. Gram–Schmidt¶
Problem 2.1⋆ (B) The modified Gram–Schmidt algorithm is a slight variation of Gram–Schmidt where
instead of computing
𝐯_j := 𝐚_j – \sum_{k=1}^{j-1} \underbrace{𝐪_k^\top 𝐚_j}_{r_{kj}} 𝐪_k
we compute it step-by-step:
\begin{align*}
𝐯_j^1 &:= 𝐚_j \\
𝐯_j^{k+1} &:= 𝐯_j^k – 𝐪_k^\top 𝐯_j^k 𝐪_k
\end{align*}
Show that $𝐯_j^j = 𝐯_j$.
Recall that the column vectors of $Q$ are orthonormal i.e. $(𝐪_i,𝐪_j) = \delta_{ij}$.
Next observe that substituting from $𝐯^1_j$ for $𝐚_j$, for each $𝐯^i_j$ we get $i$ terms (note that the second term in the definition of $𝐯_j^{k+1}$ contributes only 1 term by orthonormality of $Q$, retrieving the contribution from $𝐯^{i-1}_j$ plus one.)
𝐯^i_j := 𝐚_j – \sum_{k=1}^{i-2}{q_k^\top 𝐚_j} q_k – {q_{i-1}^\top 𝐚_j} q_{i-1}
Problem 2.2 (B) Complete the following
function implementing the modified Gram–Schmidt algorithm:
function modifiedgramschmidt(A)
m,n = size(A)
m ≥ n || error(“Not supported”)
R = zeros(n,n)
Q = zeros(m,n)
for j = 1:n
# TODO: Implement the Modified Gram–Schmidt algorthm
A = randn(300,300)
Q,R = modifiedgramschmidt(A)
@test A ≈ Q*R
@test Q’Q ≈ I
Test Failed at In[8]:14
Expression: A ≈ Q * R
Evaluated: [0.9663001358389949 0.8995409523019485 … 0.4147247863259331 0.5123271589177961; 0.7144351307484198 -0.4563722141697532 … -1.363396263390614 -0.4263972031980699; … ; -0.3424113633229226 0.8788492928646233 … 0.5581442278118858 -0.03410640352188411; -0.49240308774040137 -1.4553342032200842 … 0.736433402132759 1.2930945571012489] ≈ [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
There was an error during testing
Stacktrace:
[1] record(ts::Test.FallbackTestSet, t::Union{Test.Error, Test.Fail})
@ Test /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Test/src/Test.jl:903
[2] do_test(result::Test.ExecutionResult, orig_expr::Any)
@ Test /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Test/src/Test.jl:637
[3] top-level scope
@ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Test/src/Test.jl:445
@ ./boot.jl:373 [inlined]
[5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
function modifiedgramschmidt(A)
m,n = size(A)
m ≥ n || error(“Not supported”)
R = zeros(n,n)
Q = zeros(m,n)
# looping through the columns
for j = 1:n
v = A[:,j]
# now go through each row
for k = 1:j-1
# take inner product with q_k and v
R[k, j] = Q[:, k]’ * v
v = v – R[k, j] * Q[:, k]
R[j,j] = norm(v)
Q[:,j] = v/R[j,j]
A = randn(300,300)
Q,R = modifiedgramschmidt(A)
@test A ≈ Q*R
@test Q’Q ≈ I
Test Passed
Expression: Q’ * Q ≈ I
Evaluated: [1.0000000000000004 2.7755575615628914e-17 … -1.2011225347663412e-14 -1.8587562045091488e-15; 2.7755575615628914e-17 0.9999999999999998 … -2.2176704916887502e-14 -5.738465258531278e-15; … ; -1.2011225347663412e-14 -2.2176704916887502e-14 … 1.0000000000000004 -5.551115123125783e-17; -1.8587562045091488e-15 -5.738465258531278e-15 … -5.551115123125783e-17 1.0000000000000004] ≈ UniformScaling{Bool}(true)
Problem 2.3 (B) Compare the orthogonality of Q between gramschmidt and modifiedgramschmidt
when applied to a 300 × 300 random matrix.
For reference, here is the gramschmidt algorithm from the lecture notes:
function gramschmidt(A)
m,n = size(A)
m ≥ n || error(“Not supported”)
R = zeros(n,n)
Q = zeros(m,n)
for j = 1:n
for k = 1:j-1
R[k,j] = Q[:,k]’*A[:,j]
v = A[:,j] – Q[:,1:j-1]*R[1:j-1,j]
R[j,j] = norm(v)
Q[:,j] = v/R[j,j]
# Now compare MGS with GS:
A = randn(300,300)
Q,R = gramschmidt(A)
Qm,Rm = modifiedgramschmidt(A)
maximum(abs.(Q*Q’)-I)
5.083780618697631e-14
Modified Gram–Schmidt is more accurate for constructing an orthogonal matrix:
maximum(abs.(Qm*Qm’)-I)
4.218847493575595e-15
3. Householder reflections¶
Problem 3.1 (B)
Complete the definition of Reflections which supports a sequence of reflections,
Q = Q_{𝐯_1} \cdots Q_{𝐯_n}
where the vectors are stored as a matrix V whose $j$-th column is $𝐯_j$, and
Q_{𝐯_j} = I – 2 𝐯_j 𝐯_j^\top.
struct Reflections{T} <: AbstractMatrix{T} V::Matrix{T} import Base: *, size, getindex size(Q::Reflections) = (size(Q.V,1), size(Q.V,1)) function *(Q::Reflections, x::AbstractVector) T = eltype(Q) r = Vector{T}(x) # convert x to a vector of type T # TODO: Apply Q in O(mn) operations by applying # the reflection corresponding to each column of Q.V to r ## SOLUTION m,n = size(Q.V) for j = n:-1:1 v = Q.V[:, j] r = r - 2 * v * dot(v, r) function getindex(Q::Reflections, k::Int, j::Int) # TODO: Return Q[k,j] in O(mn) operations (hint: use *) ## SOLUTION T = eltype(Q.V) m,n = size(Q) ej = zeros(T, m) ej[j] = one(T) return (Q*ej)[k] Y = randn(5,3) V = Y * Diagonal([1/norm(Y[:,j]) for j=1:3]) Q = Reflections(V) @test Q ≈ (I - 2V[:,1]*V[:,1]')*(I - 2V[:,2]*V[:,2]')*(I - 2V[:,3]*V[:,3]') @test Q'Q ≈ I Test Passed Expression: Q' * Q ≈ I Evaluated: [1.0000000000000009 3.5388358909926865e-16 … 1.1102230246251565e-16 -2.220446049250313e-16; 3.5388358909926865e-16 0.9999999999999996 … 2.0296264668928643e-16 1.1102230246251565e-16; … ; 1.1102230246251565e-16 2.0296264668928643e-16 … 1.0 -9.71445146547012e-17; -2.220446049250313e-16 1.1102230246251565e-16 … -9.71445146547012e-17 1.0] ≈ UniformScaling{Bool}(true) Problem 3.2 (B) Complete the following function that implements Householder QR using only $O(mn^2)$ operations. function householderqr(A) m,n = size(A) error("Only support more rows than columns") # R begins as A, modify it in place R = copy(A) Q = Reflections(Matrix(1.0I, m, n)) for j = 1:n # TODO: populate Q and R using O(m*(n-j)) operations ## SOLUTION y = copy(R[j:end, j]) y[1] += ((y[1] ≥ 0) ? 1 : -1) * norm(y) w = y / norm(y) Q.V[j:end, j] = w R[j:end, :] = R[j:end, :] - 2 * w * (w' * R[j:end, :]) A = randn(4,6) Q,R = householderqr(A) @test Q*R ≈ A @test Q'Q ≈ I Only support more rows than columns Stacktrace: [1] error(s::String) @ Base ./error.jl:33 [2] householderqr(A::Matrix{Float64}) @ Main ./In[13]:4 [3] top-level scope @ In[13]:23 @ ./boot.jl:373 [inlined] [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1196 Here we sketch some additional notes comparing performance. # Here's a performance comparison with what we saw in the lecture notes: function householderreflection(x) y = copy(x) # we cannot use sign(x[1]) in case x[1] == 0 if x[1] ≥ 0 y[1] += norm(x) y[1] -= norm(x) w = y / norm(y) I - 2 * w * w' function lecturehouseholderqr(A) m,n = size(A) R = copy(A) Q = Matrix(1.0I, m, m) for j = 1:n Qⱼ = householderreflection(R[j:end,j]) R[j:end,:] = Qⱼ*R[j:end,:] Q[:,j:end] = Q[:,j:end]*Qⱼ using BenchmarkTools for j = 1:5 A = randn(j*100,j*100) @btime lecturehouseholderqr(A); 12.192 ms (1335 allocations: 20.91 MiB) 124.802 ms (2875 allocations: 164.88 MiB) 416.176 ms (4391 allocations: 554.01 MiB) 964.258 ms (5895 allocations: 1.28 GiB) 1.938 s (7399 allocations: 2.50 GiB) for j = 1:5 A = randn(j*100,j*100) @btime householderqr(A); 6.684 ms (1125 allocations: 15.82 MiB) 68.678 ms (2365 allocations: 124.19 MiB) 217.071 ms (3581 allocations: 416.67 MiB) 485.972 ms (4785 allocations: 984.77 MiB) 915.203 ms (5989 allocations: 1.88 GiB) 4. Banded QR with Given's rotations¶ Problem 4.1⋆ (A) Describe an algorithm for computing the QR decomposition of a tridiagonal matrix using rotations instead of reflections to upper-triangularise column-by-column. Let $A$ be a tridiagonal matrix: A = \begin{bmatrix} a_{11} & a_{12} & 0 & \cdots &0\\ a_{21} & a_{22} & a_{23} & \ddots &\vdots\\ 0 & a_{32} & a_{33} & \ddots &0\\ & 0 & \ddots & \ddots &a_{n-1,n}\\ & & 0 & a_{n,n-1} & a_{nn} \end{bmatrix} = \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 &\cdots & \mathbf{a}_n \end{bmatrix}, $$where each $\mathbf{a}_j \in \mathbb{R}^n$ and $[\mathbf{a}_j]_k = 0$ for $|j-k| > 1$.
Recall that,
\frac{1}{\sqrt{a^2 + b^2}} \begin{bmatrix}
\end{bmatrix} \begin{bmatrix}
\end{bmatrix} = \begin{bmatrix}
\sqrt{a^2 + b^2} \\ 0
\end{bmatrix},
\frac{1}{\sqrt{a^2 + b^2}} \begin{bmatrix}
\end{bmatrix} = \begin{bmatrix}
\cos \theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{bmatrix},
is a rotation matrix where $\theta = -\arctan(b/a)$. With this in mind, consider multiplying $A$ from the left by,
Q_1 = \begin{bmatrix}
\frac{a_{11}}{r_{11}} & \frac{a_{21}}{r_{11}} \\
-\frac{a_{21}}{r_{11}} & \frac{a_{11}}{r_{11}} \\
&&&\ddots\\
\end{bmatrix},
where $r_{11} = \sqrt{a_{11}^2 + a_{21}^2}$. This rotates dimensions 1 and 2 through angle $\theta=-\arctan(a_{21}/a_{11})$. We have,
Q_1 \mathbf{a}_1 = \begin{bmatrix}
r_{11} \\0 \\ \vdots \\ 0
\end{bmatrix},\hspace{10mm}
Q_1 \mathbf{a}_2 = \begin{bmatrix}
r_{12} := \frac{1}{r_{11}}(a_{11}a_{12} + a_{21}a_{22}) \\
t_1 := \frac{1}{r_{11}}(a_{11}a_{22} – a_{21}a_{12}) \\
\vdots \\ 0
\end{bmatrix},\hspace{10mm}
Q_1 \mathbf{a}_3 = \begin{bmatrix}
r_{13} := \frac{1}{r_{11}}a_{21}a_{23} \\
s_1 := \frac{1}{r_{11}}a_{11}a_{23} \\
\vdots \\ 0
\end{bmatrix},\hspace{10mm}
Q_1 \mathbf{a}_k = \mathbf{a}_k \textrm{ for } k > 3.
Then we take,
Q_2 = \begin{bmatrix}
& \frac{t_1}{r_{22}} & \frac{a_{32}}{r_{22}} \\
& -\frac{a_{32}}{r_{22}} & \frac{t_1}{r_{22}} \\
&&&&\ddots\\
\end{bmatrix},
where $r_{22} = \sqrt{t_1^2 + a_{32}^2}$, a matrix which rotates dimensions 2 and 3 through angle $\theta_2 = -\arctan(a_{32}/t_1)$. Then,
Q_2 Q_1 \mathbf{a}_{1} = Q_1\mathbf{a}_1 = \begin{bmatrix}
r_{11} \\0 \\ \vdots \\ 0
\end{bmatrix},\hspace{10mm}
Q_2 Q_1 \mathbf{a}_{2} = Q_2\begin{bmatrix}
\vdots \\ 0
\end{bmatrix} = \begin{bmatrix}
\end{bmatrix},\hspace{10mm}
Q_2Q_1\mathbf{a}_3 = Q_2 \begin{bmatrix}
\vdots \\ 0
\end{bmatrix}
=\begin{bmatrix}
r_{23} := \frac{1}{r_{22}}(t_1s_1 + a_{32}a_{33}) \\
t_2 := \frac{1}{r_{22}}(t_1a_{33} – a_{32}s_1) \\
\vdots \\ 0
\end{bmatrix}
Q_2Q_1\mathbf{a}_4 = Q_2\begin{bmatrix}
\end{bmatrix} = \begin{bmatrix}
r_{24} := \frac{1}{r_{22}}a_{32}a_{34} \\
s_2 := \frac{1}{r_{22}} t_1 a_{34} \\
\end{bmatrix}, \hspace{10mm}
Q_2Q_1 \mathbf{a}_k = \mathbf{a}_k \textrm{ for } k > 4.
$$Now, for $j=3\rightarrow n-1$ we take,
Q_j := \begin{bmatrix}
\mathbf{I}_{j-1} \\
& \frac{t_{j-1}}{r_{jj}} & \frac{a_{j+1, j}}{r_{jj}} \\
& \frac{a_{j+1, j}}{r_{jj}} & \frac{t_{j-1}}{r_{jj}} \\
&&& \mathbf{I}_{n-j-1}
\end{bmatrix},
where $r_{jj} := \sqrt{t_{j-1}^2 + a_{j+1, j}^2}$.
This gives,
Q_j \dots Q_1 \mathbf{a}_k = \begin{bmatrix}
\mathbf{0}_{k-3} \\
r_{k-2,k} \\
r_{k-1, k} \\
r_{k,k} \\
\mathbf{0}_{n-k}
\end{bmatrix},
for $k \leq j$,
Q_j \dots Q_1 \mathbf{a}_{j+1} = \begin{bmatrix}
\mathbf{0} \\
r_{j-1,j+1} \\
r_{j,j+1} := \frac{1}{r_{jj}}(t_{j-1}s_{j-1} + a_{j+1,j}a_{j+1,j+1}) \\
t_j := \frac{1}{r_{jj}}(t_{j-1}a_{j+1,j+1} – s_{j-1}a_{j+1,j})\\
a_{j+2,j+1} \\
\vdots \\ 0
\end{bmatrix},\hspace{10mm}
Q_j \dots Q_1 \mathbf{a}_{j+2} = \begin{bmatrix}
\mathbf{0} \\
r_{j,j+2} := \frac{1}{r_{jj}}a_{j+1,j}a_{j+1,j+2} \\
s_j := \frac{1}{r_{jj}}t_{j-1} a_{j+1,j+2} \\
a_{j+2,j+2} \\
\end{bmatrix}.
Finally we define, $r_{nn} = \frac{1}{r_{n-1,n-1}}(t_{n-2}a_{n,n} – a_{n, n-1}s_{n-2})$, to obtain,
Q_{n-1}\dots Q_1 A = \begin{bmatrix}
r_{11} & r_{12} & r_{13} & 0 & \dots & 0\\
0 & r_{22} & r_{23} & r_{24} & & 0\\
& \ddots & \ddots & \ddots & \ddots & \vdots\\
& & 0 & r_{n-2,n-2} & r_{n-2, n-1} & r_{n-2,n}\\
& & & 0 & r_{n-1,n-1} & r_{n-1,n}\\
& & & & 0 & r_{n,n}\\
\end{bmatrix} =: R
$$so that $A = QR$, for $Q = Q_1^{-1}\dots Q_{n-1}^{-1}$, where each matrix $Q_j$ rotates the coordinates $(j,j+1)$ through the angle $\theta_j = -\arctan(a_{j+1, j}/t_{j-1})$, and thus each matrix $Q_j^{-1}$ rotates the coordinates $(j, j+1)$ through the angle $\arctan(a_{j+1, j}/t_{j-1})$. This will be important for Problem 4.3.
Problem 4.2 (B) Implement Rotations which represents an orthogonal matrix Q that is a product
of rotations of angle θ[k], each acting on the entries k:k+1. That is, it returns $Q = Q_1⋯Q_k$ such that
Q_k[k:k+1,k:k+1] =
\begin{bmatrix}
cos(θ[k]) -sin(θ[k])
sin(θ[k]) cos(θ[k])
\end{bmatrix}
struct Rotations{T} <: AbstractMatrix{T} θ::Vector{T} import Base: *, size, getindex size(Q::Rotations) = (length(Q.θ)+1, length(Q.θ)+1) function *(Q::Rotations, x::AbstractVector) T = eltype(Q) y = convert(Vector{T}, x) # TODO: Apply Q in O(n) operations, modifying y in-place ## SOLUTION #Does Q1....Qn x for k = length(θ):-1:1 #below has 4 ops to make the matrix and 12 to do the matrix-vector multiplication, #total operations will be 48n = O(n) c, s = cos(θ[k]), sin(θ[k]) y[k:(k+1)] = [c -s; s c] * y[k:(k+1)] function getindex(Q::Rotations, k::Int, j::Int) # TODO: Return Q[k,j] in O(n) operations (hint: use *) ## SOLUTION #recall that A_kj = e_k'*A*e_j for any matrix A #so if we use * above, this will take O(n) operations n = size(Q)[1] ej = zeros(eltype(Q), n) #note, must be careful to ensure that ej is a VECTOR #not a MATRIX, otherwise * above will not be used Qj = Q * ej θ1 = randn(5) Q = Rotations(θ1) @test Q'Q ≈ I @test Rotations([π/2, -π/2]) ≈ [0 0 -1; 1 0 0; 0 -1 0] Test Passed Expression: Rotations([π / 2, -π / 2]) ≈ [0 0 -1; 1 0 0; 0 -1 0] Evaluated: [6.123233995736766e-17 -6.123233995736766e-17 -1.0; 1.0 3.749399456654644e-33 6.123233995736766e-17; 0.0 -1.0 6.123233995736766e-17] ≈ [0 0 -1; 1 0 0; 0 -1 0] Problem 4.3 (A) Combine Rotations and UpperTridiagonal from last problem sheet to implement a banded QR decomposition, bandedqr, that only takes $O(n)$ operations. Hint: use atan(y,x) to determine the angle. # First we include UpperTridiagonal from last problem sheet. # bandedqr is below. import Base: *, size, getindex, setindex! struct UpperTridiagonal{T} <: AbstractMatrix{T} d::Vector{T} # diagonal entries du::Vector{T} # super-diagonal enries du2::Vector{T} # second-super-diagonal entries 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com