CS代写 MATH50003 Numerical Analysis: Problem Sheet 5¶

MATH50003 Numerical Analysis: Problem Sheet 5¶
This problem sheet explores positive definite matrices,
Cholesky decompositions, matrix norms, and the singular value decomposition.

Questions marked with a ⋆ are meant to be completed without using a computer.
Problems are denoted A/B/C to indicate their difficulty.

using LinearAlgebra, Plots, Test

1. Positive definite matrices and Cholesky decompositions¶
Problem 1.1⋆ (C) Use the Cholesky decomposition to determine
which of the following matrices are symmetric positive definite:
\begin{bmatrix} 1 & -1 \\
\end{bmatrix}, \begin{bmatrix} 1 & 2 & 2 \\
2 & 1 & 2\\
\end{bmatrix}, \begin{bmatrix} 3 & 2 & 1 \\
2 & 4 & 2\\
\end{bmatrix},
\begin{bmatrix} 4 & 2 & 2 & 1 \\
2 & 4 & 2 & 2\\
2 & 2 & 4 & 2 \\
1 & 2 & 2 & 4
\end{bmatrix}

A matrix is symmetric positive definite (SPD) if and only if it has a Cholesky decomposition, so the task here is really just to compute Cholesky decompositions (by hand). Since our goal is to tell if the Cholesky decompositions exist, we do not have to compute $L_k$’s. We only need to see if the decomposition process can keep to the end.

$$A_0=\begin{bmatrix} 1 & -1 \\
\end{bmatrix}$$$A_1=3-\frac{(-1)\times(-1)}{1}>0$, so Matrix 1 is SPD.

$$A_0=\begin{bmatrix}
1 & 2 & 2 \\
2 & 1 & 2 \\
\end{bmatrix}$$$$A_1=\begin{bmatrix}
\end{bmatrix}-\begin{bmatrix} 2 \\ 2 \end{bmatrix}\begin{bmatrix} 2 & 2 \end{bmatrix}=
\begin{bmatrix}
\end{bmatrix}$$$A_1[1,1]<0$, so Matrix 2 is not SPD. $$A_0=\begin{bmatrix} 3 & 2 & 1 \\ 2 & 4 & 2 \\ \end{bmatrix}$$$$A_1= \begin{bmatrix} \end{bmatrix}-\frac{1}{3}\begin{bmatrix} 2 \\ 1 \end{bmatrix}\begin{bmatrix} 2 & 1 \end{bmatrix}=\frac{1}{3} \begin{bmatrix} \end{bmatrix}$$$3A_2=13-\frac{4\times 4}{8}>0$, so Matrix 3 is SPD.

$$A_0=\begin{bmatrix}
4 & 2 & 2 & 1 \\
2 & 4 & 2 & 2 \\
2 & 2 & 4 & 2 \\
1 & 2 & 2 & 4
\end{bmatrix}$$$$A_1=\begin{bmatrix}
\end{bmatrix}-\frac{1}{4}\begin{bmatrix} 2 \\ 2 \\ 1 \end{bmatrix}\begin{bmatrix} 2 & 2 & 1 \end{bmatrix}=\frac{1}{4}
\begin{bmatrix}
\end{bmatrix}$$$$4A_2=\begin{bmatrix}
\end{bmatrix}-\frac{1}{12}\begin{bmatrix} 4 \\ 6 \end{bmatrix}\begin{bmatrix} 4 & 6 \end{bmatrix}=\frac{4}{3}
\begin{bmatrix}
\end{bmatrix}$$$3A_3=9-\frac{3\times 3}{8}>0$, so Matrix 4 is SPD.

We can check that we did this correctly by running the following in Julia:

cholesky([1 -1; -1 3])

Cholesky{Float64, Matrix{Float64}}
2×2 UpperTriangular{Float64, Matrix{Float64}}:
⋅ 1.41421

# this throws an error when uncommented and run because the matrix is not SPD
# cholesky([1 2 2; 2 1 2; 2 2 1])

cholesky([3 2 1; 2 4 2; 1 2 5])

Cholesky{Float64, Matrix{Float64}}
3×3 UpperTriangular{Float64, Matrix{Float64}}:
1.73205 1.1547 0.57735
⋅ 1.63299 0.816497
⋅ ⋅ 2.0

cholesky([4 2 2 1; 2 4 2 2; 2 2 4 2; 1 2 2 4])

Cholesky{Float64, Matrix{Float64}}
4×4 UpperTriangular{Float64, Matrix{Float64}}:
2.0 1.0 1.0 0.5
⋅ 1.73205 0.57735 0.866025
⋅ ⋅ 1.63299 0.612372
⋅ ⋅ ⋅ 1.62019

Problem 1.2⋆ (B) Recall that an inner product $⟨𝐱, 𝐲⟩$ on $ℝ^n$
over the reals $ℝ$ satisfies, for all $𝐱,𝐲,𝐳 ∈ ℝ$ and $a,b ∈ ℝ$:

Symmetry: $⟨𝐱, 𝐲⟩ = ⟨𝐲, 𝐱⟩$
Linearity: $⟨a𝐱+b𝐲, 𝐳⟩ = a ⟨𝐱, 𝐳⟩+ b⟨𝐲, 𝐳⟩$
Posive-definite: $⟨𝐱, 𝐱⟩ > 0, x \neq 0$

Prove that $⟨𝐱, 𝐲⟩$ is an inner product if and only if
⟨𝐱, 𝐲⟩ = 𝐱^⊤ K 𝐲
where $K$ is a symmetric positive definite matrix.

We begin by showing that $⟨𝐱, 𝐲⟩ = 𝐱^⊤ K 𝐲$ with $K$ spd defines an inner product. To do this we simply verify the three properties: For symmetry, we find
$$ ⟨𝐱, 𝐲⟩ = 𝐱^⊤ K�� = 𝐱 \cdot (K𝐲) = (K𝐲) \cdot 𝐱$$
$$= (K𝐲)^⊤ 𝐱 = 𝐲^⊤ K^⊤𝐱 = 𝐲^⊤ K 𝐱 = ⟨𝐲, 𝐱⟩.$$
For linearity:
$$ ⟨a𝐱+b𝐲, 𝐳⟩ = (a𝐱+b𝐲)^⊤ K𝐳 = (a𝐱^⊤+b𝐲^⊤)K𝐳$$
$$ = a𝐱^⊤ K𝐳 + b𝐲^⊤ K𝐳 = a⟨𝐱, 𝐳⟩ + b⟨𝐲, 𝐳⟩.$$
Positive-definiteness of the matrix $K$ immediately yields $⟨𝐱, 𝐱⟩ = 𝐱^⊤ K 𝐱 >0$. Now we turn to the converse result, i.e. that there exists a symmetric positive definite matrix $K$ for any inner product ⟨𝐱, 𝐲⟩ such that it can be written as $⟨𝐱, 𝐲⟩ = ��^⊤ K 𝐲$. Define the entries of $K$ by $K_{ij} = ⟨e_i, e_j⟩$ where $e_j$ is the $j$-th standard basis vector. Note that by linearity of the inner product any inner product on $ℝ^n$ can be written as $⟨𝐱, 𝐲⟩ = \sum_{k=0}^n \sum_{l=0}^n x_k y_l ⟨e_k, e_l⟩$ by linearity. But with the elements of $K$ defined as above this is precisely
$$⟨𝐱, 𝐲⟩ = \sum_{k=0}^n \sum_{l=0}^n x_k K_{kl} y_l = 𝐱^⊤ K 𝐲.$$
What remains is to show that this $K$ is symmetric positive definite. Symmetry is an immediate conseqence of the symmetry of its elements, i.e. $K_{ij} = ⟨e_i, e_j⟩ = ⟨e_j, e_i⟩ = K_{ji}$. Finally, positive definiteness follows from the positive definiteness of the inner product $⟨𝐱, 𝐱⟩ > 0$ with $⟨𝐱, 𝐱⟩ = 𝐱^⊤ K 𝐱$.

Problem 1.3⋆ (A) Show that a matrix is symmetric positive definite if and only if it has a Cholesky
decomposition of the form
where $U$ is upper triangular with positive entries on the diagonal.

We didn’t discuss this but note that because a symmetric positive definite matrix has stricly positive eigenvalues: for a normalised
eigenvector we have
λ = λ𝐯^⊤ 𝐯 = 𝐯^⊤ K 𝐯 > 0.
Thus they are always invertible. Then note that any such matrix has a Cholesky decomposition of standard form $A = L L^⊤$ where $L$ is lower triangular. The inverse of this standard form Cholesky decomposition is then $A^{-1} = L^{-T} L^{-1}$, which is of the desired form since $L$ is lower triangular and $L^{-T}$ is upper triangular. The positive entries on the diagonal follow directly because this is the case for the Cholesky decomposition factors of the original matrix. Thus, since all symmetric positive definite matrices can be written as the inverses of a symmetric positive definite matrix, this shows that they all have a decomposition $A = U U^⊤$ (using the Cholesky factors of its inverse).

Alternatively, we can replicate the procedure of computing the Cholesky decomposition beginning in the bottom right
instead of the top left. Write:
A = \begin{bmatrix} K & 𝐯\\
𝐯^⊤ & α \end{bmatrix} =
\underbrace{\begin{bmatrix} I & {𝐯 \over \sqrt{α}} \\
& \sqrt{α}
\end{bmatrix}}_{U_1}
\begin{bmatrix} K – {𝐯 𝐯^⊤ \over α} & \\
& 1 \end{bmatrix}
\underbrace{\begin{bmatrix} I \\
{𝐯^⊤ \over \sqrt{α}} & \sqrt{α}
\end{bmatrix}}_{U_1^⊤}
The induction proceeds as in the lower triangular case.

Problem 1.4⋆ (A) Prove that the following $n × n$ matrix is symmetric positive definite
for any $n$:
Δ_n := \begin{bmatrix}
-1 & 2 & -1 \\
& -1 & 2 & ⋱ \\
&& ⋱ & ⋱ & -1 \\
&&& -1 & 2
\end{bmatrix}
Deduce its two Cholesky decompositions: $Δ_n = L_n L_n^⊤ = U_n U_n^⊤$ where
$L_n$ is lower triangular and $U_n$ is upper triangular.

We first prove that $L_n$ is lower bidiagonal by induction. Let $A$ be a tridiagonal SPD matrix:

$$A=\left[\begin{array}{c|ccc}
\alpha & \beta & 0 & \cdots \\\hline
\beta & & & \\
0 & & K & \\
\vdots & & & \\
\end{array}\right]$$where $K$ is again tridiagonal. Denote the Cholesky decomposition of $A$ by $A=LL^\top$. Recalling the proof of the Theorem (Cholesky & SPD) from the lecture, we can write

$$L=\left[\begin{array}{c|ccc}
\sqrt{\alpha} & & 0 & \\\hline
\frac{\beta}{\sqrt{\alpha}} & & & \\
0 & & \tilde{L} & \\
\vdots & & &
\end{array}\right]$$where $\tilde{L}$ satisfies $\tilde{L}\tilde{L}^\top=K-\begin{bmatrix} \beta^2/\alpha & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix}$ which is a tridiagonal matrix smaller than $A$. Since $L$ is lower bidiagonal when $A$ is $1\times 1$, we know by induction that $L$ is lower bidiagonal for $A$ of any size.

Once we know that $L_n$ is lower bidiagonal, we can write it as
$$L_n=\begin{bmatrix}
b_1 & \ddots & \\
& \ddots & \ddots & \\
& & b_{n-1} & a_n
\end{bmatrix}$$
which we substitute into $\Delta_n=L_nL_n^\top$ to get
$$\left\{\begin{array}{l}
a_1=\sqrt{2}\\
a_kb_k=-1\\
b_{k}^2+a_{k+1}^2=2
\end{array}\right.$$
for $k=1,\dots,n-1$.

Now we solve the recurrence. Substituting the second equation into the third one:
$$\frac{1}{a_k^2}+a_{k+1}^2=2.$$
Let $c_k=a_k^2$:
$$\frac{1}{c_k}+c_{k+1}=2.$$
Consider the fixing point of this recurrence: $\frac{1}{x}+x=2\Longrightarrow(x-1)^2=0$ has the double root $x=1$ which hints us to consider $\frac{1}{c_k-1}$. In fact, the recurrence is equivalent to
$$\frac{1}{c_{k+1}-1}=\frac{1}{c_k-1}+1.$$
Recalling that $c_1=2$, we know that $\frac{1}{c_k-1}=k$. As a result, $c_k=(k+1)/k$, $a_k=\sqrt{(k+1)/k}$ and $b_k=-\sqrt{k/(k+1)}$, hence we know $L_n$.

We can apply the same process to $U_n$, but this is a special case since flipping $\Delta_n$ horizontally and vertically gives itself: $P\Delta_nP^\top=\Delta_n$ where
P=\begin{bmatrix} & & 1 \\ & ⋰ & \\ 1 & & \end{bmatrix}
is the permutation that reverses a vector.
So we can also flip $L_n$ to get $U_n$:
so that $U_n U_n^⊤ = P L_n P P L_n^⊤ P = P Δ_n P = Δ_n$.

Alternatively one can use the procedure from Problem 1.3. That is, write:
Δ_n = \begin{bmatrix} Δ_{n-1} & -𝐞_n \\
-𝐞_n^⊤ & 2 \end{bmatrix} =
\underbrace{\begin{bmatrix} I & {-𝐞_n \over \sqrt{2}} \\
& \sqrt{2}
\end{bmatrix}}_{U_1}
\begin{bmatrix} Δ_{n-1} – {𝐞_n 𝐞_n^⊤ \over 2} & \\
& 1 \end{bmatrix}
\underbrace{\begin{bmatrix} I \\
{𝐯^⊤ \over \sqrt{α}} & \sqrt{α}
\end{bmatrix}}_{U_1^⊤}
Continuing proceeds as above.

Problem 1.5 (B) SymTridiagonal(dv, eu) is a type for representing symmetric tridiagonal
matrices (that is, SymTridiagonal(dv, ev) == Tridiagonal(ev, dv, ev)). Complete the following
implementation of cholesky to return a Bidiagonal cholesky factor in $O(n)$ operations,
and check your result
compared to your solution of Problem 1.3 for n = 1_000_000.

import LinearAlgebra: cholesky

# return a Bidiagonal L such that L’L == A (up to machine precision)
cholesky(A::SymTridiagonal) = cholesky!(copy(A))

# return a Bidiagonal L such that L’L == A (up to machine precision)
# You are allowed to change A
function cholesky!(A::SymTridiagonal)
d = A.dv # diagonal entries of A
u = A.ev # sub/super-diagonal entries of A
T = float(eltype(A)) # return type, make float in case A has Ints
n = length(d)
ld = zeros(T, n) # diagonal entries of L
ll = zeros(T, n-1) # sub-diagonal entries of L

## SOLUTION
ld[1]=sqrt(d[1])
for k=1:n-1
ll[k]=u[k]/ld[k]
ld[k+1]=sqrt(d[k+1]-ll[k]^2)

Bidiagonal(ld, ll, :L)

A = SymTridiagonal(2*ones(n),-ones(n-1))
L = cholesky(A)
@test L ≈ cholesky(Matrix(A)).L

Test Passed
Expression: L ≈ (cholesky(Matrix(A))).L
Evaluated: 1000×1000 Bidiagonal{Float64, Vector{Float64}}:
diag: 1.4142135623730951 1.224744871391589 … 1.0004998750624627
sub: -0.7071067811865475 -0.8164965809277261 … -0.9994998749374592 ≈ [1.4142135623730951 0.0 … 0.0 0.0; -0.7071067811865475 1.224744871391589 … 0.0 0.0; … ; 0.0 0.0 … 1.0005003753127755 0.0; 0.0 0.0 … -0.9994998749374592 1.0004998750624627]

2. Matrix norms¶
Problem 2.1⋆ (B) Prove the following:
\begin{align*}
\|A\|_∞ &= \max_k \|A[k,:]\|_1 \\
\|A\|_{1 → ∞} &= \|\hbox{vec}(A)\|_∞ = \max_{kj} |a_{kj}|
\end{align*}

Step 1. upper bounds

$$\|A\mathbf{x}\|_\infty=\max_k\left|\sum_ja_{kj}x_j\right|\le\max_k\sum_j|a_{kj}x_j|\le
\begin{cases}
\max\limits_j|x_j|\max\limits_k\sum\limits_j|a_{kj}|=\|\mathbf{x}\|_\infty\max\limits_k\|A[k,:]\|_1\\
\max\limits_{kj}|a_{kj}|\sum\limits_j|x_j|=\|\mathbf{x}\|_1\|\text{vec}(A)\|_\infty
\end{cases}
$$Step 2.1. meeting the upper bound ($\|A\|_{1 → ∞}$)

Let $a_{lm}$ be the entry of $A$ with maximum absolute value. Let $\mathbf{x}=\mathbf{e}_m$, then
$$\|A\mathbf{x}\|_\infty=\max_k\left|\sum_ja_{kj}x_j\right|=\max_k|a_{km}|=|a_{lm}|$$
$$\|\mathbf{x}\|_1\|\text{vec}(A)\|_\infty=1\cdot|a_{lm}|.$$

Step 2.2. meeting the upper bound ($\|A\|_∞$)

Let $A[n,:]$ be the row of $A$ with maximum 1-norm. Let $\mathbf{x}=\left(\text{sign}.(A[n,:])\right)^\top$, then $\left|\sum_ja_{kj}x_j\right|\begin{cases} =\sum_j|a_{kj}|=\|A[k,:]\|_1 & k=n \\ \le\sum_j|a_{kj}|=\|A[k,:]\|_1 & k\ne n \end{cases}$, so
$$\|A\mathbf{x}\|_\infty=\max_k\left|\sum_ja_{kj}x_j\right|=\max\limits_k\|A[k,:]\|_1$$
$$\|\mathbf{x}\|_\infty\max\limits_k\|A[k,:]\|_1=1\cdot\max\limits_k\|A[k,:]\|_1.$$

Conclusion

In both cases, equality can hold, so the upper bounds are actually maxima.

Problem 2.2⋆ (B) For a rank-1 matrix $A = 𝐱 𝐲^⊤$ prove that
\|A \|_2 = \|𝐱\|_2 \|𝐲\|_2.
Hint: use the Cauchy–Schwartz inequality.

$$\|A\mathbf{z}\|_2=\|\mathbf{x}\mathbf{y}^\top\mathbf{z}\|_2=|\mathbf{y}^\top\mathbf{z}|\|\mathbf{x}\|_2,$$so it remains to prove that $\|\mathbf{y}\|_2=\sup_{\mathbf{z}}\frac{|\mathbf{y}^\top\mathbf{z}|}{\|\mathbf{z}\|_2}$.

By Cauchy-Schwartz inequality,
$$|\mathbf{y}^\top\mathbf{z}|=|(\mathbf{y},\mathbf{z})|\le\|\mathbf{y}\|_2\|\mathbf{z}\|_2$$
with the two sides being equal when $\mathbf{y}$ and $\mathbf{z}$ are linearly dependent, in which case the bound is tight.

Problem 2.3⋆ (B) Show for any orthogonal matrix $Q ∈ ℝ^m$ and
matrix $A ∈ ℝ^{m × n}$ that
\|Q A\|_F = \|A\|_F
by first showing that $\|A \|_F = \sqrt{\hbox{tr}(A^⊤ A)}$ using the
trace of an $m × m$ matrix:
\hbox{tr}(A) = a_{11} + a_{22} + ⋯ + a_{mm}.

$$\text{tr}(A^\top A)=\sum_k(A^\top A)[k,k]=\sum_k\sum_jA^\top[k,j]A[j,k]=\sum_k\sum_jA[j,k]^2=\|A\|_F^2.$$On the other hand,
$$\text{tr}(A^\top A)=\text{tr}(A^\top Q^\top QA)=\text{tr}((QA)^\top (QA))=\|QA\|_F^2,$$
so $\|Q A\|_F = \|A\|_F$.

3. Singular value decomposition¶
Problem 3.1⋆ (B) Show that $\|A \|_2 ≤ \|A\|_F ≤ \sqrt{r} \|A \|_2$ where
$r$ is the rank of $A$.

From Problem 2.3 use the fact that $\|A \|_F = \sqrt{\hbox{tr}(A^⊤ A)}$, where $A\in \mathbb{R}^{m\times n}$.

$$\|A \|_F^2 = \hbox{tr}(A^⊤ A) = \sigma_1^2 +…+\sigma_m^2$$where $\sigma_1\ge…\ge \sigma_n \ge 0$ are the singular values of $A$ and $\sigma_i^2$ are the eigenvalues of $A^⊤ A$

Knowing that $\|A\|_2^2 = \sigma_1^2$ we have $\|A \|_2^2 ≤ \|A\|_F^2$

Moreover, since if the rank of $A$ is $r$ we have that $\sigma_{r+1}=…=\sigma_m=0$ and we also know $\sigma_1\ge…\ge \sigma_n \ge 0$, we have that

$\|A\|_F^2 = \sigma_1^2 +…+\sigma_m^2 =\sigma_1^2 +…+\sigma_r^2 \le r \sigma_1^2 =r \|A \|_2^2$

\|A \|_2 ≤ \|A\|_F ≤ \sqrt{r} \|A \|_2.

Problem 3.2 (A) Consider functions sampled on a $(n+1) × (n+1)$ 2D grid
$(x_k,y_j) = (k/n, j/n)$ where $k,j = 0,…,n$.
For $n = 100$, what is the lowest rank $r$ such that
the best rank-$r$ approximation to the samples
that is accurate to within $10^{-5}$ accuracy for the following functions:
(x + 2y)^2, \cos(\sin x {\rm e}^y), 1/(x + y + 1), \hbox{sign}(x-y)
For which examples does the answer change when $n = 1000$?

#define functions
f₁(x,y) = (x + 2 * y) ^ 2
f₂(x,y) = cos(sin(x)*exp(y))
f₃(x,y) = 1/(x + y + 1)
f₄(x,y) = sign(x-y)

#define n and error goal
error = 1e-5

#helper function to compute nxn samples
function samples(f, n)
x = y = range(0, 1; length=n)
return f.(x,y’)

samples (generic function with 1 method)

function find_min_rank(f, n, ϵ)
F = samples(f,n)
U,σ,V = svd(F)
Σ_k = Diagonal(σ[1:k])
U_k = U[:,1:k]
V_k = V[:,1:k]
if norm(U_k * Σ_k * V_k’ – F) <= ϵ println("Error ≤ ", error, " with n = ", n) println("Rank for f₁ = ", find_min_rank(f₁, n, error)) println("Rank for f₂ = ", find_min_rank(f₂, n, error)) println("Rank for f₃ = ", find_min_rank(f₃, n, error)) println("Rank for f₄ = ", find_min_rank(f₄, n, error)) println("Error ≤ ", error, " with n = ", n) println("Rank for f₁ = ", find_min_rank(f₁, n, error)) println("Rank for f₂ = ", find_min_rank(f₂, n, error)) println("Rank for f₃ = ", find_min_rank(f₃, n, error)) println("Rank for f₄ = ", find_min_rank(f₄, n, error)) Error ≤ 1.0e-5 with n = 100 Rank for f₁ = 3 Rank for f₂ = 5 Rank for f₃ = 4 Rank for f₄ = 100 Error ≤ 1.0e-5 with n = 1000 Rank for f₁ = 3 Rank for f₂ = 5 Rank for f₃ = 5 Rank for f₄ = 1000 Problem 3.3⋆ (B) Define the pseudo-inverse: A^+ := V Σ^{-1} U^⊤. Show that it satisfies the Moore-Penrose conditions: $A A^+ A = A$ $A^+ A A^+ = A^+$ $(A A^+)^⊤ = A A^+$ and $(A^+ A)^⊤ = A^+ A$ Let $A=UΣ V^⊤$ and $A^+ := V Σ^{-1} U^⊤$. Note that $UU^⊤ = U^⊤U = I$ ($U$ orthonormal) and $V^⊤ V =I$ $$A A^+ A = U \Sigma V^⊤ V \Sigma^{-1} U^⊤ U \Sigma V^⊤ = U \Sigma \Sigma^{-1} \Sigma V^⊤ = U\Sigma V^⊤ = A$$ $$A^+ A A^+ = V \Sigma^{-1}U^⊤ U \Sigma V^⊤ V \Sigma^{-1} U^⊤ = V \Sigma^{-1}\Sigma \Sigma^{-1} U^⊤ = V \Sigma^{-1} U^⊤ = A^+$$ Furthermore, $AA^+ = U \Sigma V^⊤ V \Sigma^{-1} U^⊤ = U U^⊤ = I$, thus $$(AA^+)^⊤ = I^⊤ = I = AA^+$$Moreover, $A^+A = V \Sigma^{-1} U^⊤ U \Sigma V^⊤ = VV^⊤$, hence, $$(A^+A)^⊤ = (VV^⊤)^⊤=VV^⊤=A^+A$$END Problem 3.4⋆ (A) Show for $A ∈ ℝ^{m × n}$ with $m ≥ n$ and ⊤ rank that $𝐱 = A^+ 𝐛$ is the least squares solution, i.e., minimises $\| A 𝐱 - 𝐛 \|_2$. Hint: extend $U$ in the SVD to be a square orthogonal matrix. The proof mimics that of the QR decomposition. Write $A = U Σ V^⊤$ and let Ũ = \begin{bmatrix}U & K \end{bmatrix} so that $Ũ$ is orthogonal. We use the fact orthogonal matrices do not change norms: \begin{align*} \|A 𝐱 - 𝐛 \|_2^2 &= \|U Σ V^⊤ 𝐱 - 𝐛 \|_2^2 = \|Ũ^⊤ U Σ V^⊤ 𝐱 - Ũ^⊤ 𝐛 \|_2^2 = \|\underbrace{\begin{bmatrix}I_m \\ O \end{bmatrix}}_{∈ ℝ^{m × n}} Σ V^⊤ 𝐱 - \begin{bmatrix} U^⊤ \\ K^⊤ \end{bmatrix} 𝐛 \|_2^2 \\ &= \|Σ V^⊤ 𝐱 - U^⊤ 𝐛 \|_2^2 + \|K^⊤ 𝐛\|^2 \end{align*} The second term is independent of $𝐱$. The first term is minimised when zero: \|Σ V^⊤ 𝐱 - U^⊤ 𝐛 \|_2 =\|Σ V^⊤ V Σ^{-1} U^⊤ 𝐛 - U^⊤ 𝐛 \|_2 = 0 Problem 3.5⋆ (A) If $A ∈ ℝ^{m × n}$ has a non-empty kernel there are multiple solutions to the least squares problem as we can add any element of the kernel. Show that $𝐱 = A^+ 𝐛$ gives the least squares solution such that $\| 𝐱 \|_2$ is minimised. Let $𝐱 =A^+b$ and let $𝐱 + 𝐤$ to be another solution i.e. \|A𝐱 - b \| = \|A (𝐱 +𝐤) - b \| Following the previous part we deduce: Σ V^⊤ (𝐱 +𝐤) = U^⊤ 𝐛 \Rightarrow V^⊤ 𝐤 = 0 As $𝐱 = V 𝐜$ lies in the span of the columns of $V$ we have $𝐱^⊤ 𝐤 = 0$. Thus \| 𝐱 + 𝐤 \|^2 = \| 𝐱 \|^2 + \| 𝐤 \|^2 which is minimised when $𝐤 = 0$. 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts