程序代写代做代考 COMP 3223: Foundations of Machine Learning Linear Algebra

COMP 3223: Foundations of Machine Learning Linear Algebra
Singular Value Decomposition
Srinandan Dasmahapatra
1/106

Matrices
You should all know the following Matrix notation
Matrix transpose
Scalar multiplication
Matrix addition & multiplication
Matrix inverse
System of linear equations in matrix form Matrix determinant
Eigenvalues and eigenvectors
2/106

Linear Regression: approximately solving equations
In linear regression the model is 􏰈y = Aw:
 wφ(x)+wφ(x)+···+wφ(x)   r   y 
001111pp111
 w0φ0(x2) + w1φ1(x2) + ··· + wpφp(x2)  . . .
   + 
r2   . . .  = 
rN
y2  . . . 
yN
w0φ0(xN) + w1φ1(xN) + ··· + wpφp(xN) Find weights {wi } that make residuals ∥r∥ small.
3/106

Linear dependence & Linear Regression
In linear regression the model is 􏰈y = Aw:
φ(x) φ(x) φ(x) y􏰈
0111p11
 φ (x )   φ (x )   φ (x )   y􏰈 
0212p22
w0  . +w1  . +···+wp  .  =  . 
φ0(xN) φ1(xN) φp(xN) y􏰈N 􏰅 􏰄􏰃 􏰆 􏰅 􏰄􏰃 􏰆 􏰅 􏰄􏰃 􏰆
col0 (A) col1 (A) colp (A)
Find linear combination of columns of design matrix that spans y.
Residual r not in space spanned by columns of A – residual orthogonal to
each column:
n rn φi (xn ) = 0 . . . is where gradient of squared loss vanishes.

4/106

Design matrix has information on patterns in data
In linear regression the model is 􏰈y = Aw:
 1 φ1(x1) ··· φp(x1) 
 1 φ1(x2) ··· φp(x2)   . . … .  1 φ1(xN) ··· φp(xN)
Idea of this lecture: decompose matrix using transformations and data appropriate descriptive bases
5/106

Reminder: Solving Linear Equations – Geometrical Picture
Solve set of equations:
()()()
abx=r cdys
Geometrically viewed as intersection of linear linear combination of vectors: rows of matrix columns of matrix
􏰃􏰆􏰅􏰄􏰃 􏰆􏰅 􏰄 ()()()
ax+by = r ↔x a +y b = r cx+dy = s c d s
6/106

The Geometrical Picture: An example
Solve set of equations:
()()()
−2 1 x = 4 −13 y −2
red vectors are columns of matrix
􏰃 􏰆􏰅 􏰄
rows of matrix
Col 2
􏰃 􏰆􏰅 􏰄
2
4
2
􏰡2 􏰡1 Col 1
1 2 3 4 5
􏰡3 􏰡2 􏰡1
􏰡2
􏰡4
Col r
􏰡2
Solution of y − 2x = 4,3y − x = −2, is (x,y) = (−2.8,−1.6).
7/106

Fundamental operations on vectors – multiply by scalars and perform addition
Col 2
2
Multiply red vectors by numbers (elements of a field) and add vectors together
􏰡2 􏰡1 Col 1
1 2 3 4 5
Col r
􏰡2
􏰡4
8/106

Column space and Range of a matrix
Col 2
2
Thus Ax = y can be solved if and only if y is a linear combination of columns of A.
􏰡2 􏰡1 Col 1
1 2 3
4 5
Col r
􏰡2
􏰡4
The column space of a matrix A (denoted col A) is the subspace spanned by all linear combinations of the columns of A.
This is also the range of the linear map: range(A)=AV = {w∈W :w=Avforsomev∈V}
9/106

Examples illustrating linear dependence and nullspace

100 0
LetB= 0 1 0 . Forvectorvindirection 0 ,Bv=0.vin
000 1
nullspace or kernel of B. 
213
ForA= 0 1 1 ,col(1)+col(2)=col(3),so
101
 
2130 1 0+1−11=0⇒ker(A)=c 1  1010 −1

−1 Show ker(A⊤)=c  1 .
2
10/106

Kernel or Null space of a matrix

213
In the previous example A =  0 1 1 , there are 3 variables v in
101
Av = y but only two independent equations.
If Av = y and x ∈ ker(A) then A(v+x) = y. Either there are no solutions or there are (infinitely) many solutions.
Thekernelofamap(ormatrix)ker(A)=nullspaceA={v∈V :Av=0}. Let A be a 3×p matrix.

—u— A=— v —,
—w—
where u, v and w are p-dim row vectors. Then, x ∈ker(A) ⇔ Ax = 0. This means x ⊥ {u, v, w}.
11/106

Rank of a matrix = number of independent equations
The rank (column rank) of A is the dimension of the column space of A. A vector space is partitioned into its range and null spaces:
dimV = dim ker(A) + dim range(A) . 􏰅 􏰄􏰃 􏰆 􏰅 􏰄􏰃 􏰆
nullity rank
We can do the same for the transpose: col(A⊤) and ker(A⊤). 4 fundamental subspaces: col(A), ker(A⊤), col(A⊤) and ker(A)
12/106

Four fundamental subspaces of a matrix
http://en.wikipedia.org/wiki/Fundamental theorem of linear algebra
T
A
nm

AT
A
dim r im(A ) im(A) dim r AT
0T0 AA
T
n ! r ker(A) ker(A ) m ! r
13/106

Linear regression with fixed functions of data
Linear combinations of fixed functions φj(xn): 
1 φ1(x1) φp(x1) y1 1  φ1(x2)   φp(x2)   y2  w0  . +w1  . +···+wp  .  =  . 
1 φ1(xN) φp(xN) yN Sets of functions constitute vector spaces
Approximate outputs/targets y by element of column space of the design matrix.
14/106

Functions constitute vector spaces
R[x], the space of polynomials ∑ amxm, where am ∈ R forms a vector m
space:
(a0 +a1x +a2x2)+(b0 +b1x) = (a0 +b0)+(a1 +b1)x + a2 x2
c0 c1 c2
Monomials as basis elements a + b = c:
(a0,a1,a2) + (b0,b1,0) = (a0 + b0,a1 + b1,a2) = (c0,c1,c2)
􏰅 􏰄􏰃 􏰆 􏰅 􏰄􏰃 􏰆 􏰅􏰄􏰃􏰆
Similarly, the set R[x1, . . . , xk ] of polynomials in k variables forms a vector space.
Set of functions of the form ∑ aneinθ (Fourier series). |n|