CS代考 StructuredMatrices

StructuredMatrices

Structured Matrices¶
We have seen how algebraic operations (+, -, *, /) are

well-defined for floating point numbers. Now we see how this allows us
to do (approximate) linear algebra operations on structured matrices. That is,
we consider the following structures:

Dense: This can be considered unstructured, where we need to store all entries in a
vector or matrix. Matrix multiplication reduces directly to standard algebraic operations.
Solving linear systems with dense matrices will be discussed later.
Triangular: If a matrix is upper or lower triangular, we can immediately invert using
back-substitution. In practice we store a dense matrix and ignore the upper/lower entries.
Banded: If a matrix is zero apart from entries a fixed distance from the diagonal it is
called banded and this allows for more efficient algorithms. We discuss diagonal,
tridiagonal and bidiagonal matrices.
Permutation: A permutation matrix permutes the rows of a vector.
Orthogonal: An orthogonal matrix $Q$ satisfies $Q^⊤ Q = I$, in other words, they are
very easy to invert. We discuss the special cases of simple rotations and reflections.

using LinearAlgebra, Plots, BenchmarkTools

1. Dense vectors and matrices¶
A Vector of a primitive type (like Int or Float64) is stored
consecutively in memory. E.g. if we have a Vector{Int8} of length
n then it is stored as 8n bits (n bytes) in a row.
A Matrix is stored consecutively in memory, going down column-by-
column. That is,

3×2 Matrix{Int64}:

Is actually stored equivalently to a length 6 vector:

6-element Vector{Int64}:

This is known as column-major format.

Remark Note that transposing A is done lazyily and so A’
stores the entries by row. That is, A’ is stored in
row-major format.

Matrix-vector multiplication works as expected:

x = [7, 8]

3-element Vector{Int64}:

Note there are two ways this can be implemented: either the traditional definition,
going row-by-row:
\begin{bmatrix} \sum_{j=1}^n a_{1,j} x_j \\ \vdots \\ \sum_{j=1}^n a_{m,j} x_j \end{bmatrix}
or going column-by-column:
x_1 𝐚_1 + \cdots + x_n 𝐚_n
It is easy to implement either version of matrix-multiplication in terms of the algebraic operations
we have learned, in this case just using integer arithmetic:

# go row-by-row
function mul_rows(A, x)
m,n = size(A)
# promote_type type finds a type that is compatible with both types, eltype gives the type of the elements of a vector / matrix
c = zeros(promote_type(eltype(x),eltype(A)), m)
for k = 1:m, j = 1:n
c[k] += A[k, j] * x[j]

# go column-by-column
function mul(A, x)
m,n = size(A)
# promote_type type finds a type that is compatible with both types, eltype gives the type of the elements of a vector / matrix
c = zeros(promote_type(eltype(x),eltype(A)), m)
for j = 1:n, k = 1:m
c[k] += A[k, j] * x[j]

mul_rows(A, x), mul(A, x)

([23, 53, 83], [23, 53, 83])

Either implementation will be $O(mn)$ operations. However, the implementation
mul accesses the entries of A going down the column,
which happens to be significantly faster than mul_rows, due to accessing
memory of A in order. We can see this by measuring the time it takes using @btime:

A = randn(n,n) # create n x n matrix with random normal entries
x = randn(n) # create length n vector with random normal entries

@btime mul_rows(A,x)
@btime mul(A,x)
@btime A*x; # built-in, high performance implementation. USE THIS in practice

2.356 ms (1 allocation: 7.94 KiB)
743.688 μs (1 allocation: 7.94 KiB)
130.390 μs (1 allocation: 7.94 KiB)

Here ms means milliseconds (0.001 = 10^(-3) seconds) and μs means microseconds (0.000001 = 10^(-6) seconds).
So we observe that mul is roughly 3x faster than mul_rows, while the optimised * is roughly 5x faster than mul.

Remark (advanced) For floating point types, A*x is implemented in BLAS which is generally multi-threaded
and is not identical to mul(A,x), that is, some inputs will differ in how the computations
are rounded.

Note that the rules of arithmetic apply here: matrix multiplication with floats
will incur round-off error (the precise details of which are subject to the implementation):

A = [1.4 0.4;
A * [1, -1] # First entry has round-off error, but 2nd entry is exact

2-element Vector{Float64}:
0.9999999999999999

And integer arithmetic will be prone to overflow:

A = fill(Int8(2^6), 2, 2) # make a matrix whose entries are all equal to 2^6
A * Int8[1,1] # we have overflowed and get a negative number -2^7

2-element Vector{Int8}:

Solving a linear system is done using \:

A = [1 2 3;
b = [10; 11; 12]

3-element Vector{Float64}:
41.000000000000036
-17.000000000000014

Despite the answer being integer-valued,
here we see that it resorted to using floating point arithmetic,
incurring rounding error.
But it is “accurate to (roughly) 16-digits”.
As we shall see, the way solving a linear system works is we first write A as a
product of simpler matrices, e.g., a product of triangular matrices.

Remark (advanced) For floating point types, A \ x is implemented in LAPACK, which
like BLAS is generally multi-threaded and in fact different machines may round differently.

2. Triangular matrices¶
Triangular matrices are represented by dense square matrices where the entries below the
are ignored:

A = [1 2 3;
U = UpperTriangular(A)

3×3 UpperTriangular{Int64, Matrix{Int64}}:

We can see that U is storing all the entries of A:

3×3 Matrix{Int64}:

Similarly we can create a lower triangular matrix by ignoring the entries above the diagonal:

L = LowerTriangular(A)

3×3 LowerTriangular{Int64, Matrix{Int64}}:

If we know a matrix is triangular we can do matrix-vector multiplication in roughly half
the number of operations.
Moreover, we can easily invert matrices.
Consider a simple 3×3 example, which can be solved with \:

b = [5,6,7]

3-element Vector{Float64}:
2.1333333333333333
0.2666666666666666
0.7777777777777778

Behind the seens, \ is doing back-substitution: considering the last row, we have all
zeros apart from the last column so we know that x[3] must be equal to:

b[3] / U[3,3]

0.7777777777777778

Once we know x[3], the second row states U[2,2]*x[2] + U[2,3]*x[3] == b[2], rearranging
we get that x[2] must be:

(b[2] – U[2,3]*x[3])/U[2,2]

0.2666666666666666

Finally, the first row states U[1,1]*x[1] + U[1,2]*x[2] + U[1,3]*x[3] == b[1] i.e.
x[1] is equal to

(b[1] – U[1,2]*x[2] – U[1,3]*x[3])/U[1,1]

2.1333333333333333

More generally, we can solve the upper-triangular system
\begin{bmatrix}
u_{11} & \cdots & u_{1n} \\ & \ddots & \vdots \\ && u_{nn}
\end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} =
\begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix}
by computing $x_n, x_{n-1}, \ldots, x_1$ by the back-substitution formula:
x_k = {b_k – \sum_{j=k+1}^n u_{kj} x_j \over u_{kk}}

The problem sheet will explore implementing this method, as well
as forward substitution for inverting lower triangular matrices. The cost of multiplying and solving linear systems with a
triangular matrix is $O(n^2)$.

3. Banded matrices¶
A banded matrix is zero off a prescribed number of diagonals.
We call the number of (potentially) non-zero diagonals the bandwidths:

Definition (bandwidths) A matrix $A$ has lower-bandwidth $l$ if
$A[k,j] = 0$ for all $k-j > l$ and upper-bandwidth $u$ if
$A[k,j] = 0$ for all $j-k > u$. We say that it has strictly lower-bandwidth $l$
if it has lower-bandwidth $l$ and there exists a $j$ such that $A[j+l,j] \neq 0$.
We say that it has strictly upper-bandwidth $u$
if it has upper-bandwidth $u$ and there exists a $k$ such that $A[k,k+u] \neq 0$.

Definition (Diagonal) Diagonal matrices are square matrices with bandwidths $l = u = 0$.

Diagonal matrices in Julia are stored as a vector containing the diagonal entries:

x = [1,2,3]
D = Diagonal(x)

3×3 Diagonal{Int64, Vector{Int64}}:

It is clear that we can perform diagonal-vector multiplications and solve linear systems involving diagonal matrices efficiently
(in $O(n)$ operations).

Bidiagonal¶
Definition (Bidiagonal) If a square matrix has bandwidths $(l,u) = (1,0)$ it is lower-bidiagonal and
if it has bandwidths $(l,u) = (0,1)$ it is upper-bidiagonal.

We can create Bidiagonal matrices in Julia by specifying the diagonal and off-diagonal:

Bidiagonal([1,2,3], [4,5], :L)

3×3 Bidiagonal{Int64, Vector{Int64}}:

Bidiagonal([1,2,3], [4,5], :U)

3×3 Bidiagonal{Int64, Vector{Int64}}:

Multiplication and solving linear systems with Bidiagonal systems is also $O(n)$ operations, using the standard
multiplications/back-substitution algorithms but being careful in the loops to only access the non-zero entries.

Tridiagonal¶
Definition (Tridiagonal) If a square matrix has bandwidths $l = u = 1$ it is tridiagonal.

Julia has a type Tridiagonal for representing a tridiagonal matrix from its sub-diagonal, diagonal, and super-diagonal:

Tridiagonal([1,2], [3,4,5], [6,7])

3×3 Tridiagonal{Int64, Vector{Int64}}:

Tridiagonal matrices will come up in second-order differential equations and orthogonal polynomials.
We will later see how linear systems involving tridiagonal matrices can be solved in $O(n)$ operations.

4. Permutation Matrices¶
Permutation matrices are matrices that represent the action of permuting the entries of a vector,
that is, matrix representations of the symmetric group $S_n$, acting on $ℝ^n$.
Recall every $σ \in S_n$ is a bijection between $\{1,2,\ldots,n\}$ and itself.
We can write a permutation $σ$ in Cauchy notation:
\begin{pmatrix}
1 & 2 & 3 & \cdots & n \cr
σ_1 & σ_2 & σ_3 & \cdots & σ_n
\end{pmatrix}
where $\{σ_1,\ldots,σ_n\} = \{1,2,\ldots,n\}$ (that is, each integer appears precisely once).
We denote the inverse permutation by $σ^{-1}$, which can be constructed by swapping the rows of
the Cauchy notation and reordering.

We can encode a permutation in vector $\mathbf σ = [σ_1,\ldots,σ_n]^⊤$.
This induces an action on a vector (using indexing notation)
𝐯[\mathbf σ] = \begin{bmatrix}v_{σ_1}\\ \vdots \\ v_{σ_n} \end{bmatrix}

Example (permutation of a vector)
Consider the permutation $σ$ given by
\begin{pmatrix}
1 & 2 & 3 & 4 & 5 \cr
1 & 4 & 2 & 5 & 3
\end{pmatrix}
We can apply it to a vector:

σ = [1, 4, 2, 5, 3]
v = [6, 7, 8, 9, 10]
v[σ] # we permutate entries of v

5-element Vector{Int64}:

Its inverse permutation $σ^{-1}$ has Cauchy notation coming from swapping the rows of
the Cauchy notation of $σ$ and sorting:
\begin{pmatrix}
1 & 4 & 2 & 5 & 3 \cr
1 & 2 & 3 & 4 & 5
\end{pmatrix} \rightarrow \begin{pmatrix}
1 & 2 & 4 & 3 & 5 \cr
1 & 3 & 2 & 5 & 4
\end{pmatrix}
Julia has the function invperm for computing the vector that encodes
the inverse permutation:
And indeed:

σ⁻¹ = invperm(σ) # note that ⁻¹ are just unicode characters in the variable name

5-element Vector{Int64}:

And indeed permuting the entries by σ and then by σ⁻¹ returns us
to our original vector:

v[σ][σ⁻¹] # permuting by σ and then σⁱ gets us back

5-element Vector{Int64}:

Note that the operator
P_σ(𝐯) = 𝐯[\mathbf σ]
is linear in $𝐯$, therefore, we can identify it with a matrix whose action is:
P_σ \begin{bmatrix} v_1\\ \vdots \\ v_n \end{bmatrix} = \begin{bmatrix}v_{σ_1} \\ \vdots \\ v_{σ_n} \end{bmatrix}.
The entries of this matrix are
P_σ[k,j] = 𝐞_k^⊤ P_σ 𝐞_j = 𝐞_k^⊤ 𝐞_{σ^{-1}_j} = δ_{k,σ^{-1}_j} = δ_{σ_k,j}
where $δ_{k,j}$ is the Kronecker delta:
δ_{k,j} := \begin{cases} 1 & k = j \\
0 & \hbox{otherwise}
\end{cases}.

This construction motivates the following definition:

Definition (permutation matrix) $P \in ℝ^{n × n}$ is a permutation matrix if it is equal to
the identity matrix with its rows permuted.

Example (5×5 permutation matrix)
We can construct the permutation representation for $σ$ as above as follows:

P = I(5)[σ,:]

5×5 SparseArrays.SparseMatrixCSC{Bool, Int64} with 5 stored entries:
1 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 1 ⋅
⋅ 1 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 1
⋅ ⋅ 1 ⋅ ⋅

And indeed, we see its action is as expected:

5-element Vector{Int64}:

Remark (advanced) Note that P is a special type SparseMatrixCSC. This is used
to represent a matrix by storing only the non-zero entries as well as their location.
This is an important data type in high-performance scientific computing, but we will not
be using general sparse matrices in this module.

Proposition (permutation matrix inverse)
Let $P_σ$ be a permutation matrix corresponding to the permutation $σ$. Then
P_σ^⊤ = P_{σ^{-1}} = P_σ^{-1}
That is, $P_σ$ is orthogonal:
P_σ^⊤ P_σ = P_σ P_σ^⊤ = I.

We prove orthogonality via:
𝐞_k^⊤ P_σ^⊤ P_σ 𝐞_j = (P_σ 𝐞_k)^⊤ P_σ 𝐞_j = 𝐞_{σ^{-1}_k}^⊤ 𝐞_{σ^{-1}_j} = δ_{k,j}
This shows $P_σ^⊤ P_σ = I$ and hence $P_σ^{-1} = P_σ^⊤$.

Permutation matrices are examples of sparse matrices that can be very easily inverted.

4. Orthogonal matrices¶
Definition (orthogonal matrix) A square matrix is orthogonal if its inverse is its transpose:
Q^⊤ Q = QQ^⊤ = I.

We have already seen an example of an orthogonal matrices (permutation matrices).
Here we discuss two important special cases: simple rotations and reflections.

Simple rotations¶
Definition (simple rotation)
A 2×2 rotation matrix through angle $θ$ is
Q_θ := \begin{bmatrix} \cos \theta & -\sin \theta \cr \sin \theta & \cos \theta \end{bmatrix}

In what follows we use the following for writing the angle of a vector:

Definition (two-arg arctan) The two-argument arctan function gives the angle θ through the point
$[a,b]^\top$, i.e.,
\sqrt{a^2 + b^2} \begin{bmatrix} \cos θ \\ \sin θ \end{bmatrix} = \begin{bmatrix} a \\ b \end{bmatrix}
It can be defined in terms of the standard arctan as follows:
{\rm atan}(b,a) := \begin{cases} {\rm atan}{b \over a} & a > 0 \\
{\rm atan}{b \over a} + π & a < 0\hbox{ and }b >0 \\
{\rm atan}{b \over a} + π & a < 0\hbox{ and }b < 0 \\ π/2 & a = 0\hbox{ and }b >0 \\
-π/2 & a = 0\hbox{ and }b < 0 \end{cases} This is available in Julia: atan(-1,-2) # angle through [-2,-1] -2.677945044588987 We can rotate an arbitrary vector to the unit axis. Interestingly it only requires basic algebraic functions (no trigonometric functions): Proposition (rotation of a vector) The matrix $$Q = {1 \over \sqrt{a^2 + b^2}}\begin{bmatrix} a & b \cr -b & a \end{bmatrix} is a rotation matrix satisfying Q \begin{bmatrix} a \\ b \end{bmatrix} = \sqrt{a^2 + b^2} \begin{bmatrix} 1 \\ 0 \end{bmatrix} The last equation is trivial so the only question is that it is a rotation matrix. Define $θ = -{\rm atan}(b, a)$. By definition of the two-arg arctan we have \begin{bmatrix} \cos θ & -\sin θ \\ \sin θ &\cos θ \end{bmatrix} = \begin{bmatrix} \cos(-θ) & \sin(-θ) \\ -\sin(-θ) & \cos(-θ) \end{bmatrix} = {1\over \sqrt{a^2 + b^2}} \begin{bmatrix} a & b \\ -b &a \end{bmatrix}. Reflections¶ In addition to rotations, another type of orthognal matrix are reflections: Definition (reflection matrix) Given a vector $𝐯$ satisfying $\|𝐯\|=1$, the reflection matrix is the orthogonal matrix Q_𝐯 \triangleq I - 2 𝐯 𝐯^⊤ These are reflections in the direction of $𝐯$. We can show this as follows: Proposition $Q_𝐯$ satisfies: Symmetry: $Q_𝐯 = Q_𝐯^⊤$ Orthogonality: $Q_𝐯^⊤ Q_𝐯 = I$ $𝐯$ is an eigenvector of $Q_𝐯$ with eigenvalue $-1$ $Q_𝐯$ is a rank-1 perturbation of $I$ $\det Q_𝐯 = -1$ Property 1 follows immediately. Property 2 follows from Q_𝐯^⊤ Q_𝐯 = Q_𝐯^2 = I - 4 𝐯 𝐯^⊤ + 4 𝐯 𝐯^⊤ 𝐯 𝐯^⊤ = I Property 3 follows since Q_𝐯 𝐯 = -𝐯 Property 4 follows since $𝐯 𝐯^⊤$ is a rank-1 matrix as all rows are linear combinations of each other. To see property 5, note there is a dimension $n-1$ space $W$ orthogonal to $𝐯$, that is, for all $𝐰 \in W$ we have $𝐰^⊤ 𝐯 = 0$, which implies that In other words, $1$ is an eigenvalue with multiplicity $n-1$ and $-1$ is an eigenvalue with multiplicity 1, and thus the product of the eigenvalues is $-1$. Example (reflection through 2-vector) Consider reflection through $𝐱 = [1,2]^\top$. We first need to normalise $𝐱$: 𝐯 = {𝐱 \over \|𝐱\|} = \begin{bmatrix} {1 \over \sqrt{5}} \\ {2 \over \sqrt{5}} \end{bmatrix} Note this indeed has unit norm: \|𝐯\|^2 = {1 \over 5} + {4 \over 5} = 1. Thus the reflection matrix is: Q_𝐯 = I - 2 𝐯 𝐯^⊤ = \begin{bmatrix}1 \\ & 1 \end{bmatrix} - {2 \over 5} \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} = {1 \over 5} \begin{bmatrix} 3 & -4 \\ -4 & -3 \end{bmatrix} Indeed it is symmetric, and orthogonal. It sends $𝐱$ to $-𝐱$: Q_𝐯 𝐱 = {1 \over 5} \begin{bmatrix}3 - 8 \\ -4 - 6 \end{bmatrix} = -𝐱 Any vector orthogonal to $𝐱$, like $𝐲 = [-2,1]^\top$, is left fixed: Q_𝐯 𝐲 = {1 \over 5} \begin{bmatrix}-6 -4 \\ 8 - 3 \end{bmatrix} = 𝐲 Note that building the matrix $Q_𝐯$ will be expensive ($O(n^2)$ operations), but we can apply $Q_𝐯$ to a vector in $O(n)$ operations using the expression: Q_𝐯 𝐱 = 𝐱 - 2 𝐯 (𝐯^⊤ 𝐱). Just as rotations can be used to rotate vectors to be aligned with coordinate axis, so can reflections, but in this case it works for vectors in $ℝ^n$, not just $ℝ^2$: Definition (Householder reflection) For a given vector $𝐱$, define the Householder reflection Q_𝐱^{±,\rm H} := Q_𝐰 for $𝐲 = ∓ \|𝐱\| 𝐞_1 + 𝐱$ and $𝐰 = {𝐲 \over \|𝐲\|}$. The default choice in sign is: Q_𝐱^{\rm H} := Q_𝐱^{-\hbox{sign}(x_1),\rm H}. Lemma (Householder reflection) Q_𝐱^{±,\rm H} 𝐱 = ±\|𝐱\| 𝐞_1 \begin{align*} \| 𝐲 \|^2 &= 2\|𝐱\|^2 ∓ 2 \|𝐱\| x_1, \\ 𝐲^⊤ 𝐱 &= \|𝐱\|^2 ∓ \|𝐱\| x_1 \end{align*} where $x_1 = 𝐞_1^\top 𝐱$. Therefore: Q_𝐱^{±,\rm H} 𝐱 = (I - 2 𝐰 𝐰^⊤) 𝐱 = 𝐱 - 2 {𝐲 \|𝐱\| \over \|𝐲\|^2} (\|𝐱\|∓x_1) = 𝐱 - 𝐲 = ±\|𝐱\| 𝐞_1. Why do we choose the the opposite sign of $x_1$ for the default reflection? For stability. We demonstrate the reason for this by numerical example. Consider $𝐱 = [1,h]$, i.e., a small perturbation from $𝐞_1$. If we reflect to $\hbox{norm}(𝐱)𝐞_1$ we see a numerical problem: h = 10.0^(-10) y = -norm(x)*[1,0] + x w = y/norm(y) Q = I-2w*w' 2-element Vector{Float64}: It didn't work! Even worse is if h = 0: y = -norm(x)*[1,0] + x w = y/norm(y 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts