CS代考 Machine Learning: Mathematical Background Linear Algebra

Machine Learning: Mathematical Background Linear Algebra

Department of Computer Science University College London

Lecture Overview
Lecture Overview
1 Lecture Overview
2 Linear Spaces & Linear Mappings
3 Geometrical Structure
4 Linear Equations
5 Matrix Decompositions
6 Quadratic Forms
7 Useful Results
9 Appendix: Further Proofs

Lecture Overview
Maths & Machine Learning
Much of machine learning is concerned with:
Solving systems of linear equations −→ Linear Algebra
Minimising cost functions (a scalar function of several variables that typically measures how poorly our model fits the data).
To this end we are often interested in studying the continuous change of such functions −→ (Differential) Calculus
Characterising uncertainty in our learning environments stochastically −→ Probability
Drawing conclusions based on the analysis of data −→ Statistics

Lecture Overview
Learning Outcomes for Today’s Lecture
By the end of this lecture you should be familiar with some fundamental objects in and results of Linear Algebra
For the most part we will concentrate on the statement of results which will be of use in the main body of this module
However we will not be so concerned with the proof of these results

Linear Spaces & Linear Mappings
Lecture Overview
1 Lecture Overview
2 Linear Spaces & Linear Mappings
3 Geometrical Structure
4 Linear Equations
5 Matrix Decompositions
6 Quadratic Forms
7 Useful Results
9 Appendix: Further Proofs

Linear Spaces & Linear Mappings
Vector Spaces
Setting in which linear algebra takes place
A vector space, V , is a set, the elements of which are called vectors, denoted by bold lower case letters, e.g. x, y etc.
Two operations are defined on a vector space:
Vector Addition Scalar Multiplication
For our purposes a scalar is a real number, usually denoted by a lower case letter

Linear Spaces & Linear Mappings
Vector Spaces V must satisfy:
1 AdditiveClosure:ifx,y∈Vthenx+y∈V
2 ScalarClosure:ifα∈Randx∈Vthenαx∈V
3 Identity Element of Addition: ∃ a zero vector, 0, such that: x + 0 = x, ∀x ∈ V
4 Inverse Element of Addition: ∃ an additive inverse, −x, for each x ∈ V , such that: x + (−x) = 0
5 Identity Element of Scalar Multiplication: ∃ a multiplicative identity, 1, such that: 1x=x, ∀x∈V
6 Commutativity:x+y=y+x, ∀x,y∈V
7 Associativity:(x+y)+z=x+(y+z);andα(βx)=(αβ)x, ∀x,y,z∈V
and α, β ∈ R
8 Distributivity:α(x+y)=αx+αy;and(α+β)x=αx+βx, ∀x,y∈Vand α, β ∈ R

Linear Spaces & Linear Mappings
Linear Independence, Span, Basis & Dimension A set of vectors v1, …, vn ∈ V is linearly independent if:
α1v1 +…+αnvn =0 =⇒ α1 =…=αn =0 The span of v1,…,vn ∈ V, is the set of all vectors that can be
expressed as a linear combination of them:
span(v1,…,vn)={v | v=β1v1 +…+βnvn ∀β1,…,βn ∈R}
A basis for V is a set of vectors which are linearly independent and which span the whole of V
So every linearly independent set of vectors forms a basis for its span A dimension of V , dim(V ), is the number of vectors in a basis

Linear Spaces & Linear Mappings
Euclidean Space
So far our language has been abstract, but we will be interested in a
particular vector space: the Euclidean space, Rn
Here the vectors are n-tuples of real numbers defined as column
vectors, e.g.:
Similarly a row vector is defined as:
xT =[x1,x2,…,xn]
Note: On occasion we use [x]i as an alternative to xi
x2  x = x =  · 
 ·  xn

Linear Spaces & Linear Mappings
Euclidean Space: Addition & Multiplication Vector Addition:
Scalar Multiplication:
x2 +y2 x+y= · 
 ·  xn + yn
αx2  αx= · 
 ·  αxn

Linear Spaces & Linear Mappings
Euclidean Space: The Standard Basis
One particular basis in Rn is the standard basis:
e1 = ., 001
x1 1 0 0 x2 0 1 0
x =  .  = x1  .  + x2  .  + . . . + xn  .  . . . .
xn 0 0 1 =x1e1 +x2e2 +…+xnen
Any vector x can be expressed in the standard basis:
,en = ., . . .

Linear Spaces & Linear Mappings
Euclidean Space: Vector Addition in R2

Linear Spaces & Linear Mappings
Euclidean Space: Scalar Multiplication in R2

Linear Spaces & Linear Mappings
Euclidean Space: The Standard Basis in R2

Linear Spaces & Linear Mappings
IfV isavectorspacethenS ⊆V isasubspaceofV if:
2 Sisclosedunderaddition:x,y∈S =⇒ (x+y)∈S
3 S is closed under scalar multiplication: x ∈ S, α ∈ R =⇒ αx ∈ S
So if U and W are subspaces of V then so is their sum: U+W ={u+w|u∈U,w∈W}
IfU∩W ={0}thenU+W isadirectsum,writtenas:U⊕W It can be shown that:
dim(U + W) = dim(U) + dim(W) − dim(U ∩ W) And in particular:
dim(U ⊕ W) = dim(U) + dim(W)

Linear Spaces & Linear Mappings

IfV isavectorspace,x0 isavectorwithinV,x0 ∈V,andU ⊆V is a subspace of V , then the subset L is an affine subspace or linear manifold of V if:
L = x0 + U
={v∈V | ∃u∈U : v=x0 +u}⊆V
U is called the direction space x0 is called the support point

Linear Spaces & Linear Mappings

Note that an affine subspace for which x0 ∈/ U will exclude the null vector, 0, and hence is not a vector space
Such an affine subspace does not go through the origin

Linear Spaces & Linear Mappings
: Parametric Spaces
Affine subspaces can described by parameters:
If L is a k-dimensional affine space and {x1, x2, …, xk } is a basis of
U, then every element x ∈ L can be described by: x = x0 + λ1×1 + . . . + λk xk
Where {λi ∈ R}ki=1 are a set parameters.

Linear Spaces & Linear Mappings
: Lines, Planes & Hyperplanes
Lines are 1-dimensional affine subspaces, with elements y ∈ Rn, described as:
Whereλ∈R,U =span(x1)⊆Rn
Planes are 2-dimensional affine subspaces, with elements y ∈ Rn, described as:
y=x0 +λ1×1 +λ2×2 Where λ1,λ2 ∈ R, U = span(x1,x2) ⊆ Rn
Hyperplanes are (n − 1)-dimensional affine subspaces in Rn , with elements y ∈ Rn, described as:
Where{λ}n−1 ∈R,U =span(x ,x ,…,x ii=1 12 n−1
λi xi )⊆Rn

Linear Spaces & Linear Mappings
Denote by a bold upper case letter, e.g. A
A matrix, A ∈ Rn×m is an n × m array of numbers with elements 􏰢A 􏰣n,m , i.e.:
A11 A12 … A1m A21 A22 … A2m
A=A= . . .. .  ….
An1 An2 … Anm Note: On occasion we use [A]ij as an alternative to Aij

Linear Spaces & Linear Mappings
Linear Maps
Alinearmapisafunction,f :V →W whereV andW arevector spaces, that satisfies:
1 f(x+y)=f(x)+f(y), ∀x,y∈V 2 f(αx)=αf(x), ∀x∈V,∀α∈R

Linear Spaces & Linear Mappings
Matrices as Linear Maps
In particular, suppose V and W are finite-dimensional vector spaces with bases {vi}mi=1 and {wi}ni=1 respectively, then every matrix A ∈ Rn×m induces a linear map, f : Rm → Rn given by:
f(x) = Ax where x ∈ V and f(x) ∈ W, and where:
[Ax]i = Aijxj for: i = 1,…,n
For n = 3, m = 2:
A11 A12 􏰈x1􏰉 A11x1 + A12x2 f(x)=Ax=A21 A22× x =A21x1+A22x2
A31 A32 2 A31x1+A32x2

Linear Spaces & Linear Mappings
Matrix Addition
We can define a matrix addition operation for A, B, C ∈ Rn×m such that:
For n = 3, m = 2:
C=A+B Cij = Aij + Bij
A11 A12  C21 C22 = A21
C11 C12 
C31 C32 A31 A32
A11 + B11 = A21 + B21
B22 B31 B32
A12 + B12 A22 + B22 A32 + B32
B11 A22 + B21

Linear Spaces & Linear Mappings
Matrix Addition
This definition implies that for x ∈ Rm:
􏰑m [Cx]i =
(Aij +Bij)xj
Hence the definition implies that:
Cx = (A + B)x = Ax + Bx
Thus matrix addition offers a more efficient mechanism for adding Ax and Bx
= [Ax]i + [Bx]i

Linear Spaces & Linear Mappings
Matrix Multiplication
We can define a matrix multiplication operation for A ∈ Rn×l , B ∈ Rl×m, C ∈ Rn×m such that:
Forn=3,m=2,l =3: C11 C12  A11
A11B11 + A12B21 + A13B31 = A21B11 + A22B21 + A23B31 A31B11 + A32B21 + A33B31
C21 C22 = A21
C31 C32 A31 A32 A33 B31
And in general BA ̸= AB
Cik = A12 A13 
B12  B22 B32
A11B12 + A12B22 + A13B32 A21B12 + A22B22 + A23B32 A31B12 + A32B22 + A33B32
A22 A23 × B21

Linear Spaces & Linear Mappings
Matrix Multiplication
This definition implies that for x ∈ Rm:
􏰑m 􏰑m􏰑l 􏰑l 􏰑m
AikBkjxj = =
Where: y = Bx and z = Ay Hence the definition implies that:
Cx = ABx = A(Bx)
Thus matrix multiplication offers a mechanism for performing the operationsB:Rm →Rl followedbyA:Rl →Rn
k=1 = [z]i

Linear Spaces & Linear Mappings
Scalar Multiplication
We can define a scalar multiplication operation for A, C ∈ Rn×m
and α ∈ R such that: By:
For n = 3, m = 3:
C = αA Cij = αAij
αA11 αA12 αA = αA21 αA22
αA23 αA31 αA32 αA33

Linear Spaces & Linear Mappings
Scalar Multiplication
This definition implies that for x ∈ Rm:
􏰑m 􏰑m 􏰑m [Cx]i = Cijxj = αAijxj = α Aijxj
j=1 j=1 j=1 = α[Ax]i
Hence the definition implies that:
(αA)x = α(Ax)

Linear Spaces & Linear Mappings
Matrix Transpose
If A ∈ Rn×m then its transpose, AT ∈ Rm×n is defined by: ATij =Aji ∀i,j
Therefore:
2 (A+B)T =AT +BT 3 (αA)T=αAT
4 (AB)T =BTAT

Linear Spaces & Linear Mappings
Matrix Identity
The identity matrix, I, is a square matrix with 1’s on the diagonal and zeroes elsewhere
In is the n × n identity matrix:
1 0 … 0
For A ∈ Rn×m then:
0 1 … 0 I=. . .. .
. . . . 00…1
InA=AIm =A

Linear Spaces & Linear Mappings
Matrix Inverse
For a square matrix, A, its inverse (if it exists) satisfies:
A−1A = I = AA−1
If A−1 does not exist then we say that A is singular
For square matrices A, B with inverses A−1, B−1 respectively, then: (AB)−1 = B−1A−1
In particular, for A ∈ R2×2:
􏰈ab􏰉 −1 1 􏰈d−b􏰉
A=cd =⇒A=(ad−bc)−ca Where a,b,c,d ∈ R

Linear Spaces & Linear Mappings
Symmetric Matrices
A square matrix, A ∈ Rn×n, is symmetric if AT = A

Linear Spaces & Linear Mappings
Nullspace & Range
IfAisalinearmap,A:V →W thenthenullspace,orkernel,ofA is:
null(A) = {v ∈ V|Av = 0}
and the range, or image, of A is: range(A)={w∈W|∃v∈V suchthatAv=w}

Linear Spaces & Linear Mappings
Nullspace & Range
null(A) 0V
range(A) 0W

Linear Spaces & Linear Mappings
Columnspace & Rowspace
The columnspace of a matrix A ∈ Rn×m is the span of its m
columns (which are all vectors in Rn)
The rowspace of a matrix A ∈ Rn×m is the span of its n rows
(which are all row vectors in Rm) So:
columnspace (A) = range (A) rowspace (A) = range 􏰄AT 􏰅
Recall from our definitions of Span, Basis and Dimension that the dimension of the columnspace (rowspace) is equal to the number of linearly independent vectors amongst the columns (rows) of A

Linear Spaces & Linear Mappings
It can be shown (See Appendix for proof) that dimension of the columnspace is equal to the dimension of the rowspace, and this dimension is known as the rank of A:
rank(A) = dim (range(A))
So the rank of A is equal to the number of linearly independent
vectors amongst the columns (rows) of A
Of course, dim (range(A)) 􏰦 m and dim 􏰄range(AT )􏰅 􏰦 n, so: rank(A) 􏰦 min(n, m)

Linear Spaces & Linear Mappings
ForamatrixA∈Rn×l andamatrixB∈Rl×m:
rank(AB) 􏰦 rank(A)
rank(AB) = dim (range(AB)) rank(A) = dim (range(A))
Now, consider any vector y ∈ range(AB), then by the definition of the range there exists some x such that:
y = (AB)x = A(Bx) = Az where: z = Bx
=⇒ y ∈ range(A)
=⇒ range(AB) ⊆ range(A)
=⇒ dim(range(AB)) 􏰦 dim(range(A))
=⇒ rank(AB) 􏰦 rank(A)

Linear Spaces & Linear Mappings
ForamatrixA∈Rn×l andamatrixB∈Rl×m:
We know that:
rank(AB) 􏰦 min (rank(A), rank(B))
rank(AB) 􏰦 rank(A) rank(AB) 􏰦 rank(B)
Since these hold simultaneously:
=⇒ rank(AB) 􏰦 min (rank(A), rank(B))

Linear Spaces & Linear Mappings
The dimension of the nullspace of A ∈ Rn×m is known as the
nullity of A:
nullity(A) = dim (null(A))
Rank-Nullity Theorem:
rank(A) + nullity(A) = m
(See Appendix for proof)

Geometrical Structure
Lecture Overview
1 Lecture Overview
2 Linear Spaces & Linear Mappings
3 Geometrical Structure
4 Linear Equations
5 Matrix Decompositions
6 Quadratic Forms
7 Useful Results
9 Appendix: Further Proofs

Geometrical Structure
A norm equips a vector space with a notion of length
More formally a norm on a real vector space, V , is a function, ∥·∥:V →Rthatsatisfiesthefollowing,∀x,y∈V and∀α∈R:
1 ∥x∥􏰧0withequalityiffx=0 2 ∥αx∥ = |α|∥x∥
3 ∥x+y∥􏰦∥x∥+∥y∥

Geometrical Structure
Examples of norms include:
∥x∥1 = ∥x∥2 =􏰝
where: p 􏰧 1

Geometrical Structure
Inner Product
An inner product equips a vector space with a notion of similarity
More formally an inner product on a real vector space, V , is a function, ⟨·,·⟩ : V × V → R that satisfies the following, ∀ x,y,z ∈ V and ∀ α ∈ R:
1 ⟨x,x⟩􏰧0withequalityiffx=0
2 ⟨x+y,z⟩=⟨x,z⟩+⟨y,z⟩;and⟨αx,y⟩=α⟨x,y⟩ 3 ⟨x,y⟩ = ⟨y,x⟩
An inner product on V induces a norm on V : ∥x∥ = 􏰓⟨x, x⟩

Geometrical Structure
Orthonormality
Two vectors x, y are said to be orthogonal if ⟨x, y⟩ = 0
If two orthogonal vectors x, y have unit length, i.e. ∥x∥ = ∥y∥ = 1, then they are described as orthonormal

Geometrical Structure
Scalar Product
For Euclidean space (i.e. V = Rn) then the standard inner product
is the dot product or scalar product:
⟨x, y⟩ = x · y =
And the norm induced by this inner product is the two-norm, ∥ · ∥2 In addition for Euclidean space we can define the notion of angle,
θ, between two vectors x, y via the dot product: x · y = ∥x∥2∥y∥2 cos θ

Geometrical Structure
Orthonormality in Rn
Members of the standard basis in Rn, {ei}ni=1 are orthonormal
For example, (for n = 2):
x= 4 =8e1+4e2
But other orthonormal bases exist, for example, (for n = 2):
􏰌1􏰍 􏰌1􏰍 √√
b=2,b=2 112−1 √√
22 And this basis yields the following expression:
􏰈8􏰉 √ √ x= 4 =6 2b1+2 2b2

Geometrical Structure
Orthonormality in R2
x = [8, 4]T

Geometrical Structure
Orthogonal Matrices
A square matrix, Q ∈ Rn×n, is orthogonal if its columns are orthonormal:
Therefore:
QTQ=QQT =I QT =Q−1
Orthogonal matrices preserve inner products:
(Qx) · (Qy) = (Qx)T (Qy) = xT QT Qy = xT y = x · y
…consequently they preserve 2-norms
So multiplication by an orthogonal matrix can be considered to be a mapping that preserves vector length, but rotates or reflects the vector about the origin

Geometrical Structure
Projections
In general a projection, PU : V → U, is a linear mapping from a
vector space, V, to a subspace of V, U ⊆ V, such that: PUPU =PU
We will have particular interest in orthogonal projections in Euclidean space, which map a vector x ∈ Rn to a vector PUx ∈ U ⊆ Rn, that is ‘closest’ to x, such that ∥x − PUx∥2 is minimal.

Geometrical Structure
Orthogonal Projections onto Lines
Consider a 1-dimensional subspace passing through the origin (i.e.
a line), described by the basis vector, b
An orthogonal projection of a vector x ∈ Rn onto this line must
1 PUx=λb forsome: λ∈R 2 ∥x − PU x∥2 must be minimal

Geometrical Structure
Orthogonal Projections onto Lines

Geometrical Structure
Orthogonal Projections onto Lines Thus we seek λ that satisfies:
argmin ∥x − λb∥2 λ
Differentiating with respect to λ and seeking stationary points implies:
2(x − λb) · b = 0
=⇒ λ=x·b ∥b∥2

Geometrical Structure
Orthogonal Projections onto Lines
This tells us that (x − PU x) is orthogonal to b since:
(x − PU x) · b = x · b − λ∥b∥2 =x·b− x·b∥b∥2 =0
Furthermore, we can derive an explicit form for the matrix PU :
PU x = λb = bλ
x·b bTx bbT
=b∥b∥2 =b∥b∥2 =∥b∥2x 222
PU = bbT ∥b∥2

Geometrical Structure
Orthogonal Projections onto Subspaces
Consider an m-dimensional subspace, U ⊆ Rn, passing through
the origin, described by the basis vectors, {bi}mi=1
An orthogonal projection of a vector x ∈ Rn onto this subspace
must satisfy:
1 PUx=􏰐mi=1λibi =Bλ Where:
B = [b1,…,bm] ∈ Rn×m, λ = [λ1,…,λm]T ∈ Rm 2 ∥x − PU x∥2 must be minimal

Geometrical Structure
Orthogonal Projections onto Subspaces: 2-D Example

Geometrical Structure
Orthogonal Projections onto Subspaces Thus we seek λ that satisfies:
argmin ∥x − Bλ∥2 λ
Differentiating with respect to λ and seeking stationary points implies:
2BT (x − Bλ) = 0
=⇒ λ = (BT B)−1BT x
(These are known as the normal equations, they will be discussed in more detail in the context of Linear Regression)

Geometrical Structure
Orthogonal Projections onto Subspaces
This tells us that (x − PUx) is orthogonal to bi for all i, since:
(x−PUx)·bi =x·bi −Bλ·bi
=x·bi −B(BTB)−1BTx·bi =0
Furthermore, we can derive an explicit form for the matrix PU : PU x = Bλ
= B(BT B)−1BT x PU =B(BTB)−1BT

Linear Equations
Lecture Overview
1 Lecture Overview
2 Linear Spaces & Linear Mappings
3 Geometrical Structure
4 Linear Equations
5 Matrix Decompositions
6 Quadratic Forms
7 Useful Results
9 Appendix: Further Proofs

Linear Equations
Solving Linear Equations
Let’s say we want to solve the following system of linear equations:
whereA∈Rn×m andb∈Rn arebothknown,andx∈Rm is
This is equivalent to a set of n equations:
A11x1 + A12x2 + … + A1mxm = b1 A21x1 + A22x2 + … + A2mxm = b2
. An1x1 + An2x2 + … + Anmxm = bn

Linear Equations
Solving Linear Equations
This system of equations can have:
A unique solution, no solutions, or infinitely many solutions But it cannot have more than 1 and less than an infinite number of
solutions. Why?
If xA and xB are solutions, then xC = αxA +(1−α)xB must also be a
solution, for all α
In order to solve this system we need to:
Check how many distinct simultaneous equations we have in our system
Check the consistency of these equations
Compare the number of these equations to the number of unknowns in our system

Linear Equations
Solving Linear Equations
To check how many distinct equations we have:
Form the augmented matrix, A|b = [A, b] ∈ Rn×(m+1)
Recall that the number of linearly independent rows in a matrix is
equal to the matrix rank, so:
# distinct equations = # linearly independent rows in A|b
= rank(A|b) 􏰦 min(n, m + 1) To check the consistency of these equations:
Note that the number of distinct ‘left-hand sides’ of these equations should equal the number of these equations for consistency:
# distinct ‘LHS’ of equations = # linearly ind

Related Posts