程序代写CS代考 algorithm Matrix Decomposition

Matrix Decomposition
Australian National University
1

What can we do with wifi?
standard IEEE 802.11n WiFi signals
Person segmentation Person pose estimation
Person identification
2

4.1 Determinant
• We write the determinant as det(𝑨) or sometimes as |𝑨| so that
𝑎(( 𝑎() ⋯ 𝑎(+ 𝑎)( 𝑎)) ⋯ 𝑎)+ ⋮⋱⋮
det 𝑨 =
𝑎+( 𝑎+) ⋯ 𝑎++
• The determinant of a square matrix 𝑨 ∈ R+×+ is a function that maps 𝑨 onto a real number.
• Example 4.1 (Testing for Matrix Invertibility)
• If𝑨isa1×1matrix,then𝑨=𝑎⇒𝑨3( =4(.Itholdsifandonlyif𝑎≠0.
• For 2×2 matrices, if 𝑨 = 𝑎(( 𝑎() , recall that the inverse of A is 𝑎)( 𝑎))
𝑨3( = 1 𝑎)) −𝑎() 𝑎((𝑎)) − 𝑎()𝑎)( −𝑎)( 𝑎((
• Hence, 𝑨 is invertible if and only if
𝑎((𝑎)) − 𝑎()𝑎)( ≠ 0
• This quantity is the determinant of 𝑨 ∈ R)×), i.e.,
det𝑨=𝑎(( 𝑎()=𝑎𝑎−𝑎𝑎 𝑎)(𝑎)) (()) ())(
3

• For a square matrix 𝑨 ∈ R+×+ it holds that 𝑨 is invertible if and only if 𝑑𝑒𝑡 𝑨 ≠ 0.
• We have explicit (closed-form) expressions for determinants of small matrices in terms
of the elements of the matrix. For 𝑛 = 1,
det𝑨 =det𝑎(( =𝑎((
• For𝑛=2,
which we have observed in the preceding example.
det𝑨=𝑎(( 𝑎()=𝑎𝑎−𝑎𝑎 𝑎)(𝑎)) (()) ())(
• For 𝑛 = 3 (known as Sarrus’ rule), 𝑎(( 𝑎() 𝑎(9
𝑎)( 𝑎)) 𝑎)9 = 𝑎((𝑎))𝑎99 + 𝑎)(𝑎9)𝑎(9 + 𝑎9(𝑎()𝑎)9 𝑎9( 𝑎9) 𝑎99 −𝑎9(𝑎))𝑎(9 − 𝑎((𝑎9)𝑎)9 − 𝑎)(𝑎()𝑎99
• We call a square matrix 𝑻 an upper-triangular matrix if 𝑇𝑖𝑗 = 0 for 𝑖 > 𝑗, i.e., the matrix is zero below its diagonal.
• Analogously,wedefinealower-triangularmatrixasamatrixwithzerosaboveits diagonal.
• For a triangular matrix 𝑻 ∈ R+×+ , the determinant is the product of the diagonal
elements, i.e.,
+
det𝑻 =:𝑇:: 4 :;(

• How can we compute the determinant of an 𝑛×𝑛 (𝑛 > 3) matrix?
• We reduce this problem to computing the determinant of 𝑛 − 1 × 𝑛 − 1 matrices. By recursively applying the Laplace expansion, we can compute determinants of an 𝑛×𝑛 matrix by ultimately computing determinants of 2×2 matrices.
• Theorem 4.2 (Laplace Expansion).
• Consider a matrix 𝑨 ∈ R!×!. Then, for all 𝑗 = 1,…,𝑛:
• 1. Expansion along column 𝑗
det𝑨 =B−1#&’𝑎#’det𝑨𝒌,𝒋
• 2. Expansion along row 𝑗
!
#$%
!
det𝑨 =B−1#&’𝑎’#det𝑨𝒋,𝒌 #$%
• Here 𝑨#,’ ∈ R(!,%)×(!,%) is the submatrix of 𝑨 that we obtain when deleting row 𝑘 and column j.
5

• Example 4.3 (Laplace Expansion)
• Let us compute the determinant of
123 𝑨=312 001
• Using the Laplace expansion along the first row, yielding
123 %&% 12 %&. 32 %&/ 31
312=−1 E101+−1 E201+−1 E300
001
• We compute the determinants of all the 2×2 matrices and obtain
det𝑨 =11−0 −23−0 +30−0 =−5
• For completeness we can compare this result to computing the determinant using Sarrus’ rule:
det 𝑨 =1E1E1+3E0E3+0E2E2−0E1E3−1E0E2−3E2E1=1−6=−5.
6

Properties of the determinant
• For 𝑨 ∈ R!×!, we have the following properties
• det 𝑨𝑩 = det 𝑨 det 𝑩
• det 𝑨 = det 𝑨0
• If 𝑨 is regular (invertible), then det 𝑨,% = % 123 𝑨
• Adding a multiple of a column/row to another one does not change det 𝑨
• Multiplication of a column/row with 𝜆 ∈ R scales det 𝑨 by 𝜆. In particular,
det𝜆𝑨 =𝜆𝒏det𝑨
• Swapping two rows/columns changes the sign of det 𝑨
• Because of the last three properties, we can use Gaussian elimination to computedet𝑨 bybringing𝑨intorow-echelonform.WecanstopGaussian elimination when we have 𝑨 in a triangular form where the elements below the diagonal are all 0. Recall: the determinant of a triangular matrix is the product of the diagonal elements.
7

Example
• Let us use Gaussian elimination in order to obtain the following determinant
123
𝑨= 3 1 2 𝑟𝑜𝑤2−3×𝑟𝑜𝑤1
001 123
⇝ 0 −5 −7 001
• Now we have the upper triangular form (row-echelon form).
det 𝑨 =1× −5 ×1=−5
• We can verify this result with the previous example.
8

Understanding of determinant • Matrices characterize linear transformations.
B 1 A (1,1)T -1 1
B
C -1D
10 01
=1
1
-1 1
C
-1 D 1
A (1,1)T Left multiply a matrix When determinant is greater than 1, it will enlarge
a graph; otherwise it shrinks a graph
A (0.5,0.5)T -1 C D 1
-1
B
0.5 0 0 0.5
= 0.25
9

Determinant and invertibility
cos 45° sin 45°
− sin 45° cos 45°
A (0, 2)T
=1
B C
cos 45° sin 45°
A (1,1)T D
45° counterclock wise rotation
45° clockwise rotation cos(−45°) sin(−45°)
− sin(−45°) cos(−45°)
= 1
− sin 45° cos 45°
and cos(−45°) − sin(−45°) are inverses of each other sin(−45°) cos(−45°)
• Some linear transformations (matrices) are not invertible
AB (0,1)T DC (0,-1)T
0 0=0 00 01
You cannot restore the original rectangle from these collapsed shapes.
A (0,0)T
0 0=0
10

4.2 Eigenvalues and Eigenvectors • For𝜆∈Randasquarematrix𝑨∈R!×!
𝑝𝑨 𝜆 ≔ det 𝑨 − 𝜆𝑰
=𝑐6 +𝑐%𝜆+𝑐.𝜆. +⋯+𝑐!,%𝜆!,% + −1 !𝜆!
𝑐6, ⋯ , 𝑐!,% ∈ R, is the characteristic polynomial of 𝑨.
• The characteristic polynomial 𝑝𝑨 𝜆 ≔ det 𝑨 − 𝜆𝑰 will allow us to compute
eigenvalues and eigenvectors. • Example
•𝑨=1 4,wehave, 23
𝑝𝑨 𝜆 = det 𝑨 − 𝜆𝑰 = 1 − 𝜆 4 = 1 − 𝜆 3 − 𝜆 − 8 = 𝜆 − 5 𝜆 + 1 2 3−𝜆
11

• Let𝑨∈R!×! beasquarematrix.Then𝜆∈R isaneigenvalueof𝑨and 𝒙∈ R!\𝟎 isthecorrespondingeigenvectorof𝑨if
𝑨𝒙 = 𝜆𝒙
• We call this equation the eigenvalue equation.
• The following statements are equivalent:
• 𝜆 is an eigenvalue of 𝑨 ∈ R!×!
• There exists an 𝒙∈R!\ 𝟎 with 𝑨𝒙=𝜆𝒙 or equivalently 𝑨−𝜆𝑰! 𝒙=𝟎 can be solved non-trivially, i.e., 𝒙 ≠ 𝟎
•rk𝑨−𝜆𝑰! <𝑛 • det 𝑨 − 𝜆𝑰 = 0 12 • Non-uniqueness of eigenvectors • If 𝒙 is an eigenvector of 𝑨 associated with eigenvalue 𝜆, then for any 𝑐 ∈ R\0 itholdsthat𝑐𝒙isaneigenvectorofAwiththesameeigenvaluesince 𝑨(𝑐𝒙) = 𝑐𝑨𝒙 = 𝑐𝜆𝒙 = 𝜆(𝑐𝒙) • Thus, all vectors that are collinear (point in the same or opposite direction) to 𝒙 are also eigenvectors of 𝑨. • Theorem 4.8. 𝜆∈R is eigenvalue of 𝑨∈R!×! if and only if 𝜆 is a root of the characteristic polynomial 𝑝𝑨 𝜆 of 𝑨. 𝑝𝑨 𝜆 ≔ det 𝑨 − 𝜆𝑰 •𝑨=1 4,wehave, 23 𝑝𝑨 𝜆 = det 𝑨 − 𝜆𝑰 = 1 − 𝜆 4 2 3−𝜆 • Eigenvalues are 𝜆% = 5, 𝜆. = −1 = 1 − 𝜆 3 − 𝜆 − 8 = 𝜆 − 5 𝜆 + 1 13 • Definition. Let a square matrix 𝑨 have an eigenvalue 𝜆7. The algebraic multiplicity of 𝜆7 is the number of times the root appears in the characteristic polynomial. •𝑨=1 4,wehave, 23 𝑝𝑨 𝜆 = det 𝑨 − 𝜆𝑰 = 1 − 𝜆 4 2 3−𝜆 = 1 − 𝜆 3 − 𝜆 − 8 = 𝜆 − 5 𝜆 + 1 • Eigenvalues are 𝜆% = 5, 𝜆. = −1 • Hence it has two distinct eigenvalues and each occurs only once, so the algebraic multiplicity of both eigenvalues is one. •𝑩=5 0,wehave, 055−𝜆0. 𝑝𝑩 𝜆 = det 𝐵 − 𝜆𝑰 = 0 5 − 𝜆 = 5 − 𝜆 • Eigenvalues are 𝜆% = 𝜆. = 5 • The eigenvalue 5 has algebraic multiplicity of 2 14 • Definition. For 𝑨 ∈ R!×!, the union of the 𝟎 vector and the set of all ! eigenvectors of 𝑨 associated with an eigenvalue 𝜆 is a subspace of R , which is called the eigenspace of 𝑨 with respect to 𝜆 and is denoted by 𝐸9. • The set of all eigenvalues of 𝑨 is called the eigenspectrum, or just spectrum, of 𝑨. • If 𝜆 is an eigenvalue of 𝑨 ∈ R!×!, then the corresponding eigenspace 𝐸9 is the solution space of the homogeneous system of linear equations (𝑨 − 𝜆𝑰)𝒙 = 𝟎 • Example (The case of the Identity Matrix) • The identity matrix 𝑰 ∈ R!×! has characteristic polynomial 𝑝𝑰 𝜆 = det(𝑰 − 𝜆𝑰)= 1−𝜆 ! =0.Ithasonlyoneeigenvalue𝜆=1thatoccurs𝑛times. • Moreover, 𝑰𝒙 = 𝜆𝒙 holds for all vectors 𝒙 ∈ R!\ 𝟎 • Therefore, the sole eigenspace 𝐸% of the identity matrix spans 𝑛 dimensions, and all 𝑛 standard basis vectors of R! are eigenvectors of 𝑰. 15 • Useful properties regarding eigenvalues and eigenvectors • A matrix 𝑨 and its transpose 𝑨0 possess the same eigenvalues, but not necessarily the same eigenvectors • Symmetric, positive definite matrices always have positive, real eigenvalues. ∀𝒙∈𝑉\ 𝟎 : 𝒙0𝑨𝒙>0
• Example (Computing Eigenvalues, Eigenvectors, and Eigenspaces)
• Let us find the eigenvalues and eigenvectors of the 2×2 matrix
42 13
• Step 1: Characteristic Polynomial. We need to compute the roots of the characteristic polynomial det(𝑨 − 𝜆𝑰) = 0 to find the eigenvalues.
𝑨=
16

• Step 2: Eigenvalues. The characteristic polynomial is
𝑝” 𝜆 = det 𝑨 − 𝜆𝑰 = det 4 2 − 𝜆 0 = 4 − 𝜆 2 = 4 − 𝜆 3 − 𝜆 − 2 0 1
130𝜆13−𝜆
• We factorize the characteristic polynomial and obtain
• 𝑝” 𝜆 = 4−𝜆 3−𝜆 −201=10−7𝜆+𝜆3 = 2−𝜆 5−𝜆 givingtheroots𝜆5 =2and𝜆3 =5.
• Step 3: Eigenvectors and Eigenspaces. From our definition of the eigenvector 𝒙 ≠𝟎,therewillbeavectorsuchthat𝑨𝒙= 𝜆𝒙,i.e., 𝑨−𝜆𝑰 𝒙=𝟎.Wefindthe eigenvectors that correspond to these eigenvalues by looking at vectors 𝒙 such that
13−5𝑥𝟐 1−2𝑥𝟐
4−𝜆 2
1 3−𝜆
𝒙=𝟎
• For𝜆=5weobtain4−5 2 𝑥𝟏 = −1 2 𝑥𝟏 =𝟎
• We solve this homogeneous system and obtain a solution space
𝐸 = = s p a n [ 21 ]
• This eigenspace is one-dimensional as it possesses a single basis vector.
• Analogously, we find the eigenvector for 𝜆 = 2 by solving
4−22𝒙=22𝒙=𝟎 13−211
• The corresponding eigenspace is given as 1
𝐸3 = span[ −1 ] 17

• Definition. Let 𝜆7 be an eigenvalue of a square matrix 𝑨 . Then the geometric multiplicity of 𝜆7 is the number of linearly independent eigenvectors associated with 𝜆7. In other words, it is the dimensionality of the eigenspace spanned by the eigenvectors associated with 𝜆7.
• In our previous example, the geometric multiplicity of 𝜆 = 5 and 𝜆 = 2 is 1.
• In another example, the matrix 𝑨 = 2 1 has two repeated eigenvalues
02
𝜆% = 𝜆. = 2. The algebraic multiplicity of 𝜆% and 𝜆. is 2.
• The eigenvalue has only one distinct unit eigenvector 𝒙 = 10 , and thus geometric multiplicity is 1.
• Theorem. The eigenvectors 𝒙%, … , 𝒙! of a matrix 𝑨 ∈ R!×! with 𝑛 distinct eigenvalues 𝜆%, … , 𝜆! are linearly independent.
• Eigenvectors of a matrix with 𝑛 distinct eigenvalues form a basis of R!
18

• Definition. A square matrix 𝑨 ∈ R!×! is defective if it possesses fewer than 𝑛 linearly independent eigenvectors
• Looking at the eigenspaces of a defective matrix, it follows that the sum of the dimensions of the eigenspaces is less than 𝑛.
• A defective matrix cannot have 𝑛 distinct eigenvalues, as distinct eigenvalues have linearly independent eigenvectors.
• Theorem. Given a matrix 𝑨 ∈ R;×!, we can always obtain a symmetric, positive semidefinite matrix 𝑺 ∈ R!×! by defining
𝑺 ≔ 𝑨<𝑨 • Proof.Symmetry:𝑺≔𝑨<𝑨=𝑨< 𝑨< < = 𝑨<𝑨 < =𝑺< • positive semidefinite: 𝒙<𝑺𝒙 = 𝒙<𝑨<𝑨𝒙 = 𝑨𝒙 <𝑨𝒙 ≥ 𝟎 • If rk(𝑨) = 𝑛, then 𝑺 ≔ 𝑨<𝑨 is positive definite. 19 • Theorem (Spectral Theorem). If 𝑨 ∈ RF×F is symmetric, there exists an orthonormal basis of the corresponding vector space 𝑉 consisting of eigenvectors of 𝐴, and each eigenvalue is real • Example • Consider the matrix 322 𝑨=232 223 • The characteristic polynomial of A is 3 𝑝"𝜆=𝜆−1 𝜆−7 so that we obtain the eigenvalues 𝜆5 = 1 and 𝜆3 = 7, where 𝜆5 is a repeated eigenvalue. Following our standard procedure for computing eigenvectors, we obtain the eigenspaces −1−1 1 𝐸5=span[ 1 , 0 ],𝐸O=span[1] 011 KKP L:𝒙𝟏 L:𝒙𝟐 L:𝒙𝟑 • We see that 𝒙𝟑 is orthogonal to both 𝒙𝟏 and 𝒙𝟐. However, since 𝒙R𝟏𝒙𝟐 = 1 ≠ 0, they are not orthogonal. The spectral theorem states that there exists an orthogonal basis, but the one we have is not orthogonal. • However, we can construct one. 20 • To construct such a basis, we exploit the fact that 𝒙𝟏, 𝒙𝟐 are eigenvectors associated with the same eigenvalue 𝜆. Therefore, for any 𝛼, 𝛽 ∈ R it holds that 𝑨 𝛼𝒙𝟏+𝛽𝒙𝟐 =𝑨𝒙𝟏𝛼+𝑨𝒙𝟐𝛽=𝜆% 𝛼𝒙𝟏+𝛽𝒙𝟐 • i.e., any linear combination of 𝒙𝟏 and 𝒙𝟐 is also an eigenvector of 𝑨 associated with 𝜆%. The Gram-Schmidt algorithm is a method for iteratively constructing an orthogonal/orthonormal basis from a set of basis vectors using such linear combinations. • Therefore, even if 𝒙𝟏and 𝒙𝟐 are not orthogonal, we can apply the Gram- Schmidt algorithm and find eigenvectors associated with 𝜆% = 1 that are orthogonal to each other (and to 𝒙𝟑). In our example, we will obtain @ −1 @ 1−1 𝒙𝟏=1, 𝒙𝟐=2−1 02 • which are orthogonal to each other, orthogonal to 𝒙𝟑, and eigenvectors of 𝑨 associated with 𝜆% = 1. 21 • Theorem. The determinant of a matrix 𝑨 ∈ R!×! is the product of its eigenvalues, where 𝜆7 are (possibly repeated) eigenvalues of 𝑨. ! det𝑨 =i𝜆7 7$% 22 SVDNet for Pedestrian Retrieval. Sun et al., ICCV 2017. Some understandings 𝑨𝒙 𝒙𝒙 𝑨𝒙 𝒙 Now, 𝒙 and 𝑨𝒙 are of the same line. The length of 𝑨𝒙 is greater than 𝒙 𝑨𝒙 = 𝜆𝒙 Left multiplied by a matrix 𝑨 Everything is normal 𝑨𝒙 𝒙 The grey line is the eigenspace of 𝑨 with respect to 𝜆 Every vector on this grey line is an eigenvector of 𝑨, and they all correspond to the eigenvalue 𝜆 24 4.4 Eigendecomposition and Diagonalization • A diagonal matrix is a matrix that has value zero on all off-diagonal elements, 𝑐% ⋯ 0 ⋮⋱⋮ 0 ⋯ 𝑐! • Diagonal matrices allow fast computation of determinants, powers, and inverses. • The determinant is the product of its diagonal entries. • a matrix power 𝑫# is given by each diagonal element raised to the power 𝑘. • The inverse 𝑫,% is the reciprocal of its diagonal elements if all of them are nonzero. • Two matrices 𝑨, 𝑫 are similar if there exists an invertible matrix 𝑷, such that 𝑫 = 𝑷,𝟏𝑨𝑷. • Definition. A matrix 𝑨 ∈ R!×! is diagonalizable if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix 𝑷 ∈ R!×! such that 𝑫 = 𝑷,𝟏𝑨𝑷. 25 𝑫= • Let 𝑨 ∈ R!×!, and let 𝜆%,...,𝜆! be a set of scalars, and let 𝒑%,⋯,𝒑! be a set of vectors in R!. We define 𝑷: =[𝒑%, ⋯ , 𝒑!] and let 𝑫 ∈ R!×! be a diagonal matrix with diagonal entries 𝜆%, . . . , 𝜆!. Then we can show that 𝑨𝑷 = 𝑷𝑫 if and only if 𝜆%,...,𝜆! are the eigenvalues of 𝑨 and 𝒑%,⋯,𝒑! are corresponding eigenvectors of 𝑨. • We can see that this statement holds because • Thus 𝑨𝑷 = 𝑷𝑫 implies that 0 𝜆! 𝑨𝒑% = 𝜆%𝒑% ⋮ 𝑨𝒑! = 𝜆!𝒑! 𝑨𝑷 = 𝑨 𝒑%,⋯,𝒑! = 𝑨𝒑%,⋯,𝑨𝒑! 𝜆% 0 𝑷𝑫 = 𝒑%,⋯,𝒑! ⋱ = 𝜆%𝒑%,⋯,𝜆!𝒑! • Therefore, the columns of 𝑷 must be eigenvectors of A. 𝒑%,⋯,𝒑!,i.e.,the𝒑7 formabasisofR!. • Our definition of diagonalization requires that 𝑷 ∈ R!×! is invertible, i.e., 𝑷 has full rank. This requires us to have 𝑛 linearly independent eigenvectors 26 • Theorem (Eigendecomposition). • A square matrix 𝑨 ∈ R!×! can be factored into 𝑨 = 𝑷𝑫𝑷,% where 𝑷 ∈ R!×! and 𝑫 is a diagonal matrix whose diagonal entries are the eigenvalues of 𝑨, if and only if the eigenvectors of 𝑨 form a basis of R! • Theorem. A symmetric matrix 𝑨 ∈ R!×! can always be diagonalized. • Theorem (Spectral Theorem). If 𝑨 ∈ R!×! is symmetric, there exists an orthonormal basis of the corresponding vector space V consisting of eigenvectors of 𝑨, and each eigenvalue is real. • The spectral theorem states that we can find an ONB of eigenvectors of R!. This makes 𝑷 an orthogonal matrix (𝑷𝑷< = 𝑷<𝑷 = 𝑰) so that 𝑨 = 𝑷𝑫𝑷< or equivalently 𝑷<𝑨𝑷 = 𝑫 27 Example 2 1 • Let us compute the eigendecomposition of 𝑨 = 1 2 . • Step 1: Compute eigenvalues and eigenvectors. The characteristic polynomial of 𝑨 is det 𝑨−𝜆𝑰 =det 2−𝜆 1 = 2−𝜆 # −1=𝜆# −4𝜆+3= 𝜆−3 𝜆−1 1 2−𝜆 • Therefore, the eigenvalues of 𝑨 are 𝜆$ = 1 and 𝜆# = 3, and the associated (normalized) eigenvectors are obtained via • This yields 2 1𝒑$=1𝒑$, 2 1𝒑#=3𝒑# 12 12 𝒑$=11, 𝒑#=11 2 −1 2 1 • Step 2: Check for existence. The eigenvectors 𝒑$ , 𝒑# form a basis of R# . Therefore, 𝑨 can be diagonalized. • Step 3: Construct the matrix 𝑷 to diagonalize 𝑨. We collect the eigenvectors of 𝑨 in 𝑷 so that • We then obtain • Equivalently, we get 111 2 −1 1 𝑷%$𝑨𝑷= 1 0 =𝑫 03 2 1 = 1 1 1 1 0 1 1 −1 12 2−1103211 𝑨𝑷𝑫𝑷! 28 𝑷= 𝒑$,𝒑# = • Diagonal matrices 𝑫 can efficiently be raised to a power. Therefore, we can findamatrixpowerforamatrix𝑨∈R!×! viatheeigenvaluedecomposition (if it exists) so that 𝑨# = (𝑷𝑫𝑷,%)#= 𝑷𝑫#𝑷,% • Computing 𝑫# is efficient because we apply this operation individually to any diagonal element. • Assume that the eigendecomposition 𝑨 = 𝑷𝑫𝑷,% exists. Then, det 𝑨 =det 𝑷𝑫𝑷,% =det 𝑷 det 𝑫 det 𝑷,% =det𝑫 =i𝑑77 7 allows for an efficient computation of the determinant of 𝑨. • Eigendecomposition requires square matrices. • We introduce a more general matrix decomposition technique, the singular value decomposition. 29 4.5 Singular Value Decomposition • Theorem (SVD Theorem). Let 𝑨;×! be a rectangular matrix of rank r ∈ 0,min(m,n) .TheSVDof𝑨isadecompositionoftheform nmn = with an orthogonal matrix 𝑼 ∈ R;×; with column vectors 𝒖7 , 𝑖 = 1, . . . , 𝑚, and an orthogonal matrix 𝑽 ∈ R!×! with column vectors 𝒗', 𝑗 = 1, . . . , 𝑛. Moreover,𝚺isan𝑚×𝑛matrixwith𝛴77 =𝜎7 ≥ 0and𝛴7' =0,𝑖 ≠ 𝑗. • Thediagonalentries𝜎7,𝑖=1,...,𝑟of𝚺arecalledthesingularvalues • 𝒖7 are called the left-singular vectors • 𝒗' are called the right-singular vectors • By convention, the singular values are ordered 𝜎% ≥ 𝜎. ≥... ≥ 𝜎A ≥ 0 n A U ! "# 30 m m m n • The singular value matrix 𝚺 is unique. • 𝚺 ∈ R)×+ is rectangular and of the same size as 𝑨. This means that 𝚺 has a diagonal submatrix that contains the singular values and needs additional zero padding. • If𝑚>𝑛,𝚺hasdiagonalstructureuptorow𝑛andconsistsof𝟎, rowvectorsfrom𝑛+1to𝑚,
1 −0.8 , −0.79 0 −0.62 1.62 0 𝑨=0 1 =𝑼𝚺𝑽=0.38 −0.78−0.49 0 1.0
1 0 −0.48 −0.62 0.62 0 0
−0.78 0.62 −0.62 −0.78
𝜎$ 0 0 0⋱0 0 0 𝜎+ 0⋯0 ⋮⋮ 0⋯0
𝚺=
• If𝑚<𝑛,𝚺hasadiagonalstructureuptocolumn𝑚andcolumnsthatconsistof𝟎from𝑚+1to 𝑛: 0 0 𝜎) • The SVD exists for any matrix 𝑨 ∈ R)×+ . • Example 4.12 (Vectors and the SVD) • Consider a mapping of a square grid of vectors 𝓧 ∈ R# that fit in a box of size 2×2 entered at the origin. Using the standard basis, we map these vectors using 𝜎$ 0 0 𝚺=0⋱000 0⋯0 0⋯0 31 1 −0.8 < −0.79 0 −0.62 1.62 0 •𝑨= 0 1 =𝑼𝚺𝑽= 0.38 −0.78 −0.49 0 1.0 −0.78 0.62 −0.62 −0.78 1 0 −0.48 −0.62 0.62 0 0 𝑨 Rotation in R. 𝑽| Rotation in R/ 𝑼 𝚺 all vectors lie in the 𝑥( − 𝑥) plane. 32 4.5.2 Construction of the SVD • The SVD of a general matrix shares some similarities with the eigendecomposition of a square matrix • Compare the eigendecomposition of an SPD (Symmetric, positive definite) matrix with the corresponding SVD 𝑺=𝑺< =𝑷𝑫𝑷< 𝑺 = 𝑼𝚺𝑽< 𝑼 = 𝑷 = 𝑽, 𝑫 = 𝚺 • Ifweset we see that the SVD of SPD matrices is their eigendecomposition. 33 • We can always diagonalize 𝑨<𝑨 and obtain < <𝜆%⋯0< 𝑨𝑨=𝑷𝑫𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷 (1) 0 ⋯ 𝜆! where 𝑷 is an orthogonal matrix, which is composed of the orthonormal eigenbasis. The 𝜆7 ≥ 0 are the eigenvalues of 𝑨<𝑨. • Let us assume the SVD of 𝑨 exists and takes the form of 𝑨 = 𝑼𝚺𝑽< 𝑨<𝑨 = 𝑼𝚺𝑽< < 𝑼𝚺𝑽< = 𝑽𝚺<𝑼<𝑼𝚺𝑽< 𝑨𝑨=𝑽𝚺𝚺𝑽=𝑽0 ⋱ 0𝑽 (𝟐) 0 0 𝜎!. • Comparing now (1) and (2), we identify 𝑽=𝑷 𝜎7. = 𝜆7 where 𝑼, 𝑽 are orthogonal matrices. Therefore, with 𝑼<𝑼 = 𝑰 we obtain < << 𝜎%.00< 34 • To obtain the left-singular vectors 𝑼. 𝑨=𝑼𝚺𝑽< ⇔𝑨𝑽=𝑼𝚺𝑽<𝑽=𝑼𝚺 • We have, where 𝑟 is the rank of 𝑨. So, we can calculate 𝒖7=B%𝑨𝒗7,𝑖=1,...,𝑟 (1) - • We look at matrices with full rank, i.e., 𝑟 = min(𝑚, 𝑛). Remember that 𝑼 is an 𝑚×𝑚 matrix. • If𝑚≤𝑛,𝑼= 𝒖%,𝒖.,...,𝒖; ;Allthe𝒖7 havebeencalculatedthrough(1) • If 𝑚 > 𝑛, 𝑼 = 𝒖%,𝒖.,…,𝒖!,…,𝒖; ;
• 𝒖(, … , 𝒖+ have been calculate through (1)
• In order to calculate 𝒖+Å(,…,𝒖Ç, you use the fact that 𝒖(,𝒖),…,𝒖+,…,𝒖Ç are orthonormal vectors.
𝑨𝒗7 =𝜎7𝒖7,𝑖=1,…,𝑟
35

Summary of the SVD
• Given∈R;×!,𝑨=𝑼𝚺𝑽< • 𝑽: eigendecomposition of 𝑨<𝑨 • 𝚺: nonzero elements are 𝜎7 obtained from eigendecomposition of 𝑨0𝑨 • 𝑼: calculate 𝒖7 = B% 𝑨𝒗7 - • If𝑚≤𝑛,𝑼= 𝒖(,𝒖),...,𝒖Ç ; • If 𝑚 > 𝑛, 𝑼 = 𝒖(,𝒖),…,𝒖+,…,𝒖Ç ;
• For 𝑖 > 𝑛, the 𝒖. are orthonormal vectors that satisfy 𝒖 $, , 𝒖 ,# , … , 𝒖 ,+ 𝒖 . = 𝟎
36

4.5.2 Construction of the SVD
• Example (Computing the SVD)
• Let us find the singular value decomposition of
𝜎V, and the left-singular vectors 𝒖X.
• Step 1: Right-singular vectors as the eigenbasis of 𝑨R𝑨 . • We start by computing 1 −2 1 0 1 5 −2 1
𝑨R𝑨= 0 1 −2 1 0 = −2 1 0 10 101
101 −2 1 0
𝑨=
• The SVD requires us to compute the right-singular vectors 𝒗T, the singular values
• We compute the singular values 𝜎V and right-singular vectors 𝒗T through the eigenvalue decomposition of 𝑨R𝑨, which is given as
5 0 −1 5 −2 1 30 6600 30 30 30
𝑨|𝑨= −2 1 −2 0 1 0 0 1 2 =𝑷𝑫𝑷| 3056000 55
1 2 1 −1 −2 1
30 5 6 6 6 6
and we obtain the right-singular vectors as the columns of 𝑷 so that 37

𝑽=𝑷=
5 0 −1
30 6 −2 1 −2
30 5 6 121
30 5 6
• Step 2: Singular-value matrix.
• As the singular values 𝜎7 are the square roots of the eigenvalues of 𝑨<𝑨 we 600 obtain them straight from 0 1 0 . Since rk(𝑨) = 2, there are only two non- 000 zero singular values: 𝜎% = 6 and 𝜎. = 1. The singular value matrix must be the same size as 𝑨, and we obtain 𝚺= • Step 3: Right-singular vectors are calculated using 𝒖7 = B% 𝑨𝒗7 5- 1 𝒖$ = 1 𝑨𝒗$ = 1 1 0 1 −2 = 5 𝜎$ 6−210 30 −2 1 30 600 010 30 5 38 01 2 𝒖. = 1 𝑨𝒗. = 1 1 0 1 5 = 15 𝜎. 1 −2 1 0 2 5 5 • We obtain the left-singular vectors as the columns of 𝑺 so that 𝑼=𝑺= • Now we have computed 𝑼, 𝑽 and 𝚺. • You can verify that 1 5 12 −2 1 C ,. % /6 /6 /6 •𝑨=101=𝑼𝚺𝑽<=%126000% . −210 C−21010CC ,% ,. % DDD 39 Another example 01< • CalculatetheSVDof𝑨= 1 0 =𝑼𝚺𝑽 11 • We first calculate 𝑽 as the eigenbasis of 𝑨<𝑨. 𝑨<𝑨 = 1 1 −1 3 0 1 1 1 2 1 1 0 1 2 −1 1 𝑽= 1 1 −1 211 • The singular value matrix is • We finally calculate 𝑼 𝒖 % = 𝜎1 𝑨 𝒗 % = 1 1 1 30 01 𝚺= 00 %62 111 𝒖.=𝜎.𝑨𝒗.= 2−1 0 40 • Now we have calculated 𝒖% and 𝒖.; we want to calculate 𝒖/ • We make use of the fact that 𝒖%, 𝒖., and 𝒖/ are an orthonormal basis. 𝒖 %< 𝒖 / = 0 É 𝒖 <. 𝒖 / = 0 , 𝒖/ .=1 • We can obtain • 𝒖/ = % 1 / −1 • Inall,theSVDof𝑨iswrittenas %%% D./ % ,% % 30 01 %% .. ,% % .. 𝑨 = 𝑼𝚺𝑽<= D./ . 0 ,% 0 0 D/ 41 4.5.3 Eigenvalue Decomposition vs. Singular Value Decomposition • Let us consider the eigendecomposition 𝑨 = 𝑷𝑫𝑷,% and the SVD 𝑨 = 𝑼𝚺𝑽<. • The SVD always exists for any matrix R;×!. The eigendecomposition is only defined for square matrices R!×! and only exists if we can find a basis of eigenvectors of R! • The vectors in the eigendecomposition matrix 𝑷 are not necessarily orthogonal. On the other hand, the vectors in the matrices 𝑼 and 𝑽 in the SVD are orthonormal, so they represent rotations. • Both the eigendecomposition and the SVD are compositions of three linear mappings: 1. Changeofbasisinthedomain 2. Independentscalingofeachnewbasisvectorandmappingfromdomainto codomain 3. Changeofbasisinthecodomain A key difference between the eigendecomposition and the SVD is that in the SVD, domain and codomain can be vector spaces of different dimensions 42 4.5.3 Eigenvalue Decomposition vs. Singular Value Decomposition • In the SVD, the left- and right-singular vector matrices 𝑼 and 𝑽 are generally not inverse of each other (they perform basis changes in different vector,% spaces). In the eigendecomposition, the basis change matrices 𝑷 and 𝑷 are inverses of each other. • In the SVD, the entries in the diagonal matrix 𝚺 are all real and nonnegative, which is not generally true for the diagonal matrix in the eigendecomposition. • The SVD and the eigendecomposition are closely related through their projections - Theright-singularvectorsof𝑨areeigenvectorsof𝑨|𝑨. - Thenonzerosingularvaluesof𝑨arethesquarerootsofthenonzeroeigenvalues of 𝑨|𝑨. • For symmetric matrices 𝑨 ∈ R!×!, the eigenvalue decomposition and the SVD are one and the same, which follows from the spectral theorem. 43 Further understanding • A symmetric matrix represents a combination of rotation and scaling • Through matrix decomposition, we can explain the effect of linear transformation defined by this matrix. • Eigenvalues quantify the scaling effect. • Eigenvectors quantify the direction of the scaling • The application of eigendecomposition is limited. • SVD is a universal one by finding an orthonormal basis. 44