1/31
Linear Algebra
Maths Preliminaries
Wei Wang @ CSE, UNSW
March 18, 2018
Wei Wang @ CSE, UNSW Maths Preliminaries
2/31
Linear Algebra
Introduction
This review serves two purposes:
Recap relevant maths contents that you may have learned a
long time ago (probably not in a CS course and rarely used in
any CS course).
More importantly, present it in a way that is useful (i.e., giving
semantics/motivations) for understanding maths behind
Machine Learning.
Contents
Linear Algebra
Wei Wang @ CSE, UNSW Maths Preliminaries
3/31
Linear Algebra
Note
You’ve probability learned Linear Algebra from matrix/system
of linear equations, etc. We will review key concepts in LA
from the perspective of linear transformations (think of it as
functions for now). This perspective provides semantics and
intuition into most of the ML models and operations.
Here we emphasize more on intuitions; We deliberately skip
many concepts and present some contents in an informal way.
It is a great exercise for you to view related maths and ML
models/operations in this perspective throughout this course!
Wei Wang @ CSE, UNSW Maths Preliminaries
4/31
Linear Algebra
A Common Trick in Maths I
Question
Calculate 210, 2−1, 2ln 5 and 24−3i?
Properties:
fa(n) = fa(n − 1) ∗ a, for n ≥ 1; fa(0) = 1.
f (u) ∗ f (v) = f (u + v).
f (x) = y ⇔ ln(y) = x ln(a)⇔ f (x) = exp{x ln a}.
e ix = cos(x) + i · sin(x).
The trick:
Same in Linear algebra
Wei Wang @ CSE, UNSW Maths Preliminaries
5/31
Linear Algebra
Objects and Their Representations
Goal
We need to study the objects
On one side:
A good representation helps (a lot)!
On the other side:
Properties of the objects should be independent of the
representation!
Wei Wang @ CSE, UNSW Maths Preliminaries
6/31
Linear Algebra
Basic Concepts I
Algebra
a set of objects
two operations and their identity objects (aka. identify
element):
addition (+); its identity is 0.
scalar multiplication (·); its identity is 1.
constraints:
Closed for both operations
Some nice properties of these operations:
Communicative: a+ b = b+ a.
Associative: (a+ b) + c = a+ (b+ c).
Distributive: λ(a+ b) = λa+ λb.
Wei Wang @ CSE, UNSW Maths Preliminaries
7/31
Linear Algebra
Basic Concepts II
Think: What about substraction and division?
Tips
Always use analogy from algebra on integers (Z) and algebra on
Polynomials (P).
Why these constraints are natural and useful?
Wei Wang @ CSE, UNSW Maths Preliminaries
8/31
Linear Algebra
Basic Concepts III
Representation matters?
Consider even geometric vectors: c = a + b
What if we represent vectors by a column of their coordinates?
What if by their polar coordinates?
Notes
Informally, the objects we are concerned with in this course
are (column) vectors.
The set of all n-dimensional real vectors is called Rn.
Wei Wang @ CSE, UNSW Maths Preliminaries
9/31
Linear Algebra
(Column) Vector
Vector
A n-dimensional vector, v, is a n × 1 matrix. We can
emphasize its shape by calling it a column vector.
A row vector is a transposed column vector: v>.
Operations
Addition: v1 + v2 =
(Scalar) Multiplication: λv1 =
Wei Wang @ CSE, UNSW Maths Preliminaries
10/31
Linear Algebra
Linearity I
Linear Combination: Generalization of Univariate Linear Functions
Let λi ∈ R, given a set of k vectors vi (i ∈ [k]), a linear
combination of them is
λ1v1 + λ2v2 + . . .+ λkvk =
∑
i∈[k] λivi
Later, this is just Vλ, where
V =
v1 v2 . . . vk
λ =
λ1
λ2
. . .
λk
Span: All linear combination of a set of vectors is the span of
them.
Basis: The minimal set of vectors whose span is exactly the
whole Rn.
Wei Wang @ CSE, UNSW Maths Preliminaries
11/31
Linear Algebra
Linearity II
Benefit: every vector has a unique decomposition into basis.
Think: Why uniqueness is desirable?
Examples
Span of
[
1
0
]
and
[
0
1
]
is R2. They are also the basis.
Span of
[
1
0
]
,
[
0
1
]
,
[
2
3
]
is R2. But one of them is redundant.
Think: Who?
Decompose
[
4
6
]
Wei Wang @ CSE, UNSW Maths Preliminaries
12/31
Linear Algebra
Linearity III
Exercises
What are the (natural) basis of all (univariate) Polynomials of
degrees up to d?
Decompose 3×2 + 4x − 8 into the linear combination of 2,
2x − 3, x2 + 1.
3×2 + 4x − 7 = 3(x2 + 1) + 2(2x − 3) + (−2)(2).
The “same” polynomial is mapped to two different vectors
under two different bases. Think: Any analogy?
Wei Wang @ CSE, UNSW Maths Preliminaries
13/31
Linear Algebra
Matrix I
Linear Transformation
is a “nice” linear function that maps a vector in Rn to another
vector in Rm.
[
x1
x2
]
f−→
y1y2
y3
The general form:
[
x1
x2
]
f−→
y1y2
y3
=⇒ y1 = M11x1 + M12x2y2 = M21x1 + M22x2
y3 = M31x1 + M32x2
Wei Wang @ CSE, UNSW Maths Preliminaries
14/31
Linear Algebra
Matrix II
Nonexample
[
x1
x2
]
f−→
y1y2
y3
=⇒ y1 = αx21 + βx2y2 = γx21 + θx1 + τx2
y3 = cos(x1) + e
x2
Why Only Linear Transformation?
Simple and nice properties:
(f1 + f2)(x) = f1(x) + f2(x)
(λf )(x) = λ · f (x)
What about f (g(x))?
Useful
Wei Wang @ CSE, UNSW Maths Preliminaries
15/31
Linear Algebra
Matrix I
Definition
A m × n matrix corresponds to a linear transformation from
Rn to Rm
f ( x ) = y =⇒ M x = y, where matrix-vector
multiplication is defined as: yi =
∑
k Mik · xk
MoutDim×inDim
Transformation or Mapping emphasizes more on the mapping
between two sets, rather than the detailed specification of the
mapping; the latter is more or less the elementary
understanding of a function. These are all specific instances of
morphism in category theory.
Semantic Interpretation
Wei Wang @ CSE, UNSW Maths Preliminaries
16/31
Linear Algebra
Matrix II
Linear combination of columns of M:
M1 M2 . . . Mn
x
=
M1 M2 . . . Mn
x1
x2
…
xn
=
y1…
ym
y = x1M•1 + . . .+ xnM•n
Example:
1 2−4 9
25 1
[ 1
10
]
= 1
1−4
25
+ 10
29
1
=
2186
35
[
1 2
−4 9
] [
1
10
]
= 1
[
1
−4
]
+ 10
[
2
9
]
=
[
21
86
]
Wei Wang @ CSE, UNSW Maths Preliminaries
17/31
Linear Algebra
Matrix III
Think: What does M do for the last example?
Rotation and scaling
When x is also a matrix,
1 2−4 9
25 1
[ 1 2
10 20
]
=
21 4286 172
35 70
Wei Wang @ CSE, UNSW Maths Preliminaries
18/31
Linear Algebra
System of Linear Equations I
y1 = M11x1 + M12x2
y2 = M21x1 + M22x2
y3 = M31x1 + M32x2
=⇒
y1y2
y3
=
M11 M12M21 M22
M31 M32
[x1
x2
]
y = Mx
Interpretation: find a vector in R2 such that its image (under
M) is exactly the given vector y ∈ R3.
How to solve it?
Wei Wang @ CSE, UNSW Maths Preliminaries
19/31
Linear Algebra
System of Linear Equations II
x1
x2
x3
y1
y2
y3
Domain RangeM
The above transformation is injective, but not surjective.
Wei Wang @ CSE, UNSW Maths Preliminaries
20/31
Linear Algebra
A Matrix Also Specifies a (Generalized) Coordinate System
I
Yet another interpretation
y = Mx =⇒ Iy = Mx.
The vector y wrt standard coordinate system, I, is the same
as x wrt the coordinate system defined by column vectors of
M. Think: why columns of M?
Wei Wang @ CSE, UNSW Maths Preliminaries
21/31
Linear Algebra
A Matrix Also Specifies a (Generalized) Coordinate System
II
Example for polynomials
I:
for 1 1 0 0for x 0 1 0
for x2 0 0 1
=⇒ M:
for 3 3 −1 −4for x−1 0 1 5
for 2×2+5x−4 0 0 2
Let x =
[
1
−2
3
]
=⇒ Mx = I
[−7
13
6
]
Wei Wang @ CSE, UNSW Maths Preliminaries
22/31
Linear Algebra
Exercise I
What if y is given in the above example?
What does the following mean?
3 −1 −40 1 5
0 0 2
1 0 00 1 0
0 0 1
=
3 −1 −40 1 5
0 0 2
Think about representing polynomials using the basis:
(x − 1)2, x2 − 1, x2 + 1.
Wei Wang @ CSE, UNSW Maths Preliminaries
23/31
Linear Algebra
Inner Product
THE binary operator – some kind of “similarity”
Type signature: vector× vector→ scalar: 〈x, y〉.
In Rn, usually called dot product: x · y def= x>y =
∑
i xiyi .
For certain functions, 〈f , g〉 =
∫ b
a
f (t)g(t)dt. ⇒ leads to the
Hilbert Space
Properties / definitions for R:
conjugate symmetry: 〈x, y〉 = 〈y, x〉
linearity in the first argument: 〈ax + y, z〉 = a〈x, z〉+ 〈y, z〉
positive definitiveness: 〈x, x〉 ≥ 0; 〈x, x〉 ⇔ x = 0;
Generalizes many geometric concepts to vector spaces: angle
(orthogonal), projection, norm
〈sin nt, sinmt〉 = 0 within [−π, π] (m 6= n) ⇒ they are
orthogonal to each other.
C = A>B: Ci j = 〈Ai ,Bj〉
Special case: A>A.
Wei Wang @ CSE, UNSW Maths Preliminaries
24/31
Linear Algebra
Eigenvalues/vectors and Eigen Decomposition
“Eigen” means “characteristic of” (German)
A (right) eigenvector of a square matrix A is u such that
Au = λu.
Not all matrices have eigenvalues. Here, we only consider
“good” matrices. Not all eigenvalues need to be distinct.
Traditionally, we normalize u (such that u>u = 1).
We can use all eigenvectors of A to construct a matrix U (as
columns). Then AU = UΛ, or equivalently, A = UΛU−1.
This is the Eigen Decomposition.
We can interpret U as a transformation between two
coordinate systems. Note that vectors in U are not necessarily
orthogonal.
Λ as the scaling on each of the directions in the “new”
coordinate system.
Wei Wang @ CSE, UNSW Maths Preliminaries
25/31
Linear Algebra
Similar Matrices
Similar Matrix
Let A and B be two n × n matrix. A is similar to B (denoted
as A ∼ B) if there exists an invertible n × n matrix P such
that P−1AP = B.
Think: What does this mean?
Think of P as a change of basis transformation.
Relationship with the Eigen decomposition.
Similar matrices have the same value wrt many properties
(e.g., rank, trace, eigenvalues, determinant, etc.)
Wei Wang @ CSE, UNSW Maths Preliminaries
26/31
Linear Algebra
SVD
Singular Vector Decomposition
Let M be n × d (n ≥ d).
Reduced SVD: M = ÛΣ̂V> exists for any M, such that
Σ̂ is a diagonal matrix with diagonal elements σi (called
singular values) in decreasing order
Û consists of an (incomplete) set of basis vectors ui (left
singular vectors in Rn) (n × d : original space as M)
V̂ consists of a set of basis vectors vi (right singular vectors in
Rd) (d × d : reduced space)
Full SVD: M = UΣV>:
Add the remaining (n − d) basis vectors to Û (thus becomes
n × n).
Add the n − d rows of 0 to Λ̂ (thus becomes n × d).
Wei Wang @ CSE, UNSW Maths Preliminaries
27/31
Linear Algebra
Geometric Illustration of SVD
Geometric Meaning
Mvi = σiui
Wei Wang @ CSE, UNSW Maths Preliminaries
28/31
Linear Algebra
Graphical Illustration of SVD I
Figure: Reduced SVD vs Full SVD
M
(n×d)
= Û
(n×d)
Σ̂
(d×d)
V>
(d×d)
M
(n×d)
= U
(n×n)
Σ
(n×d)
V>
(d×d)
Wei Wang @ CSE, UNSW Maths Preliminaries
29/31
Linear Algebra
Graphical Illustration of SVD II
Meaning:
Columns of U are the basis of Rn
Rows of V> are the basis of Rd
Wei Wang @ CSE, UNSW Maths Preliminaries
30/31
Linear Algebra
SVD Applications I
Relationship between Singular Values and Eigenvalues
What are the eigenvalues of M>M?
Hint: M = UΣV> and U and V are unitary (i.e., rotations)
Related to PCA (Principle Component Analysis)
Wei Wang @ CSE, UNSW Maths Preliminaries
31/31
Linear Algebra
References and Further Reading I
Gaussian Quadrature:
Linear Algebra Review and Reference.
http://cs229.stanford.edu/section/cs229-linalg.pdf
Scipy LA tutorial. https://docs.scipy.org/doc/scipy/
reference/tutorial/linalg.html
We Recommend a Singular Value Decomposition.
http://www.ams.org/samplings/feature-column/fcarc-svd
Wei Wang @ CSE, UNSW Maths Preliminaries
http://cs229.stanford.edu/section/cs229-linalg.pdf
https://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html
https://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html
http://www.ams.org/samplings/feature-column/fcarc-svd
Linear Algebra