COMP 3223: Foundations of Machine Learning Linear Algebra
Singular Value Decomposition
Srinandan Dasmahapatra
1/106
Matrices
You should all know the following Matrix notation
Matrix transpose
Scalar multiplication
Matrix addition & multiplication
Matrix inverse
System of linear equations in matrix form Matrix determinant
Eigenvalues and eigenvectors
2/106
Linear Regression: approximately solving equations
In linear regression the model is y = Aw:
wφ(x)+wφ(x)+···+wφ(x) r y
001111pp111
w0φ0(x2) + w1φ1(x2) + ··· + wpφp(x2) . . .
+
r2 . . . =
rN
y2 . . .
yN
w0φ0(xN) + w1φ1(xN) + ··· + wpφp(xN) Find weights {wi } that make residuals ∥r∥ small.
3/106
Linear dependence & Linear Regression
In linear regression the model is y = Aw:
φ(x) φ(x) φ(x) y
0111p11
φ (x ) φ (x ) φ (x ) y
0212p22
w0 . +w1 . +···+wp . = .
φ0(xN) φ1(xN) φp(xN) yN
col0 (A) col1 (A) colp (A)
Find linear combination of columns of design matrix that spans y.
Residual r not in space spanned by columns of A – residual orthogonal to
each column:
n rn φi (xn ) = 0 . . . is where gradient of squared loss vanishes.
∑
4/106
Design matrix has information on patterns in data
In linear regression the model is y = Aw:
1 φ1(x1) ··· φp(x1)
1 φ1(x2) ··· φp(x2) . . … . 1 φ1(xN) ··· φp(xN)
Idea of this lecture: decompose matrix using transformations and data appropriate descriptive bases
5/106
Reminder: Solving Linear Equations – Geometrical Picture
Solve set of equations:
()()()
abx=r cdys
Geometrically viewed as intersection of linear linear combination of vectors: rows of matrix columns of matrix
()()()
ax+by = r ↔x a +y b = r cx+dy = s c d s
6/106
The Geometrical Picture: An example
Solve set of equations:
()()()
−2 1 x = 4 −13 y −2
red vectors are columns of matrix
rows of matrix
Col 2
2
4
2
2 1 Col 1
1 2 3 4 5
3 2 1
2
4
Col r
2
Solution of y − 2x = 4,3y − x = −2, is (x,y) = (−2.8,−1.6).
7/106
Fundamental operations on vectors – multiply by scalars and perform addition
Col 2
2
Multiply red vectors by numbers (elements of a field) and add vectors together
2 1 Col 1
1 2 3 4 5
Col r
2
4
8/106
Column space and Range of a matrix
Col 2
2
Thus Ax = y can be solved if and only if y is a linear combination of columns of A.
2 1 Col 1
1 2 3
4 5
Col r
2
4
The column space of a matrix A (denoted col A) is the subspace spanned by all linear combinations of the columns of A.
This is also the range of the linear map: range(A)=AV = {w∈W :w=Avforsomev∈V}
9/106
Examples illustrating linear dependence and nullspace
100 0
LetB= 0 1 0 . Forvectorvindirection 0 ,Bv=0.vin
000 1
nullspace or kernel of B.
213
ForA= 0 1 1 ,col(1)+col(2)=col(3),so
101
2130 1 0+1−11=0⇒ker(A)=c 1 1010 −1
−1 Show ker(A⊤)=c 1 .
2
10/106
Kernel or Null space of a matrix
213
In the previous example A = 0 1 1 , there are 3 variables v in
101
Av = y but only two independent equations.
If Av = y and x ∈ ker(A) then A(v+x) = y. Either there are no solutions or there are (infinitely) many solutions.
Thekernelofamap(ormatrix)ker(A)=nullspaceA={v∈V :Av=0}. Let A be a 3×p matrix.
—u— A=— v —,
—w—
where u, v and w are p-dim row vectors. Then, x ∈ker(A) ⇔ Ax = 0. This means x ⊥ {u, v, w}.
11/106
Rank of a matrix = number of independent equations
The rank (column rank) of A is the dimension of the column space of A. A vector space is partitioned into its range and null spaces:
dimV = dim ker(A) + dim range(A) .
nullity rank
We can do the same for the transpose: col(A⊤) and ker(A⊤). 4 fundamental subspaces: col(A), ker(A⊤), col(A⊤) and ker(A)
12/106
Four fundamental subspaces of a matrix
http://en.wikipedia.org/wiki/Fundamental theorem of linear algebra
T
A
nm
AT
A
dim r im(A ) im(A) dim r AT
0T0 AA
T
n ! r ker(A) ker(A ) m ! r
13/106
Linear regression with fixed functions of data
Linear combinations of fixed functions φj(xn):
1 φ1(x1) φp(x1) y1 1 φ1(x2) φp(x2) y2 w0 . +w1 . +···+wp . = .
1 φ1(xN) φp(xN) yN Sets of functions constitute vector spaces
Approximate outputs/targets y by element of column space of the design matrix.
14/106
Functions constitute vector spaces
R[x], the space of polynomials ∑ amxm, where am ∈ R forms a vector m
space:
(a0 +a1x +a2x2)+(b0 +b1x) = (a0 +b0)+(a1 +b1)x + a2 x2
c0 c1 c2
Monomials as basis elements a + b = c:
(a0,a1,a2) + (b0,b1,0) = (a0 + b0,a1 + b1,a2) = (c0,c1,c2)
Similarly, the set R[x1, . . . , xk ] of polynomials in k variables forms a vector space.
Set of functions of the form ∑ aneinθ (Fourier series). |n|