程序代写代做代考 1/31

1/31

Linear Algebra

Maths Preliminaries

Wei Wang @ CSE, UNSW

March 18, 2018

Wei Wang @ CSE, UNSW Maths Preliminaries

2/31

Linear Algebra

Introduction

This review serves two purposes:

Recap relevant maths contents that you may have learned a
long time ago (probably not in a CS course and rarely used in
any CS course).
More importantly, present it in a way that is useful (i.e., giving
semantics/motivations) for understanding maths behind
Machine Learning.

Contents

Linear Algebra

Wei Wang @ CSE, UNSW Maths Preliminaries

3/31

Linear Algebra

Note

You’ve probability learned Linear Algebra from matrix/system
of linear equations, etc. We will review key concepts in LA
from the perspective of linear transformations (think of it as
functions for now). This perspective provides semantics and
intuition into most of the ML models and operations.

Here we emphasize more on intuitions; We deliberately skip
many concepts and present some contents in an informal way.

It is a great exercise for you to view related maths and ML
models/operations in this perspective throughout this course!

Wei Wang @ CSE, UNSW Maths Preliminaries

4/31

Linear Algebra

A Common Trick in Maths I

Question

Calculate 210, 2−1, 2ln 5 and 24−3i?

Properties:

fa(n) = fa(n − 1) ∗ a, for n ≥ 1; fa(0) = 1.
f (u) ∗ f (v) = f (u + v).
f (x) = y ⇔ ln(y) = x ln(a)⇔ f (x) = exp{x ln a}.
e ix = cos(x) + i · sin(x).

The trick:

Same in Linear algebra

Wei Wang @ CSE, UNSW Maths Preliminaries

5/31

Linear Algebra

Objects and Their Representations

Goal

We need to study the objects

On one side:

A good representation helps (a lot)!

On the other side:

Properties of the objects should be independent of the
representation!

Wei Wang @ CSE, UNSW Maths Preliminaries

6/31

Linear Algebra

Basic Concepts I

Algebra

a set of objects

two operations and their identity objects (aka. identify
element):

addition (+); its identity is 0.
scalar multiplication (·); its identity is 1.

constraints:

Closed for both operations
Some nice properties of these operations:

Communicative: a+ b = b+ a.
Associative: (a+ b) + c = a+ (b+ c).
Distributive: λ(a+ b) = λa+ λb.

Wei Wang @ CSE, UNSW Maths Preliminaries

7/31

Linear Algebra

Basic Concepts II

Think: What about substraction and division?

Tips

Always use analogy from algebra on integers (Z) and algebra on
Polynomials (P).

Why these constraints are natural and useful?

Wei Wang @ CSE, UNSW Maths Preliminaries

8/31

Linear Algebra

Basic Concepts III

Representation matters?

Consider even geometric vectors: c = a + b
What if we represent vectors by a column of their coordinates?
What if by their polar coordinates?

Notes

Informally, the objects we are concerned with in this course
are (column) vectors.

The set of all n-dimensional real vectors is called Rn.

Wei Wang @ CSE, UNSW Maths Preliminaries

9/31

Linear Algebra

(Column) Vector

Vector

A n-dimensional vector, v, is a n × 1 matrix. We can
emphasize its shape by calling it a column vector.

A row vector is a transposed column vector: v>.

Operations

Addition: v1 + v2 =

(Scalar) Multiplication: λv1 =

Wei Wang @ CSE, UNSW Maths Preliminaries

10/31

Linear Algebra

Linearity I

Linear Combination: Generalization of Univariate Linear Functions

Let λi ∈ R, given a set of k vectors vi (i ∈ [k]), a linear
combination of them is
λ1v1 + λ2v2 + . . .+ λkvk =


i∈[k] λivi

Later, this is just Vλ, where

V =


v1 v2 . . . vk


 λ =



λ1

λ2

. . .

λk




Span: All linear combination of a set of vectors is the span of
them.

Basis: The minimal set of vectors whose span is exactly the
whole Rn.

Wei Wang @ CSE, UNSW Maths Preliminaries

11/31

Linear Algebra

Linearity II

Benefit: every vector has a unique decomposition into basis.

Think: Why uniqueness is desirable?

Examples

Span of
[
1
0

]
and

[
0
1

]
is R2. They are also the basis.

Span of
[
1
0

]
,
[
0
1

]
,
[
2
3

]
is R2. But one of them is redundant.

Think: Who?

Decompose
[
4
6

]

Wei Wang @ CSE, UNSW Maths Preliminaries

12/31

Linear Algebra

Linearity III

Exercises

What are the (natural) basis of all (univariate) Polynomials of
degrees up to d?

Decompose 3×2 + 4x − 8 into the linear combination of 2,
2x − 3, x2 + 1.

3×2 + 4x − 7 = 3(x2 + 1) + 2(2x − 3) + (−2)(2).

The “same” polynomial is mapped to two different vectors
under two different bases. Think: Any analogy?

Wei Wang @ CSE, UNSW Maths Preliminaries

13/31

Linear Algebra

Matrix I

Linear Transformation

is a “nice” linear function that maps a vector in Rn to another
vector in Rm.

[
x1

x2

]
f−→


y1y2
y3


The general form:

[
x1

x2

]
f−→


y1y2
y3


 =⇒ y1 = M11x1 + M12x2y2 = M21x1 + M22x2

y3 = M31x1 + M32x2

Wei Wang @ CSE, UNSW Maths Preliminaries

14/31

Linear Algebra

Matrix II

Nonexample

[
x1

x2

]
f−→


y1y2
y3


 =⇒ y1 = αx21 + βx2y2 = γx21 + θx1 + τx2

y3 = cos(x1) + e
x2

Why Only Linear Transformation?

Simple and nice properties:

(f1 + f2)(x) = f1(x) + f2(x)
(λf )(x) = λ · f (x)
What about f (g(x))?

Useful

Wei Wang @ CSE, UNSW Maths Preliminaries

15/31

Linear Algebra

Matrix I

Definition

A m × n matrix corresponds to a linear transformation from
Rn to Rm

f ( x ) = y =⇒ M x = y, where matrix-vector
multiplication is defined as: yi =


k Mik · xk

MoutDim×inDim
Transformation or Mapping emphasizes more on the mapping
between two sets, rather than the detailed specification of the
mapping; the latter is more or less the elementary
understanding of a function. These are all specific instances of
morphism in category theory.

Semantic Interpretation

Wei Wang @ CSE, UNSW Maths Preliminaries

16/31

Linear Algebra

Matrix II

Linear combination of columns of M:
M1 M2 . . . Mn




x

 =


M1 M2 . . . Mn





x1
x2

xn


 =


y1…
ym


y = x1M•1 + . . .+ xnM•n

Example:
 1 2−4 9

25 1


[ 1

10

]
= 1


 1−4

25


+ 10


29

1


 =


2186

35


[
1 2
−4 9

] [
1

10

]
= 1

[
1

−4

]
+ 10

[
2

9

]
=

[
21

86

]
Wei Wang @ CSE, UNSW Maths Preliminaries

17/31

Linear Algebra

Matrix III

Think: What does M do for the last example?

Rotation and scaling

When x is also a matrix,
 1 2−4 9

25 1


[ 1 2

10 20

]
=


21 4286 172

35 70


Wei Wang @ CSE, UNSW Maths Preliminaries

18/31

Linear Algebra

System of Linear Equations I

y1 = M11x1 + M12x2
y2 = M21x1 + M22x2
y3 = M31x1 + M32x2

=⇒


y1y2
y3


 =


M11 M12M21 M22
M31 M32


[x1

x2

]
y = Mx

Interpretation: find a vector in R2 such that its image (under
M) is exactly the given vector y ∈ R3.
How to solve it?

Wei Wang @ CSE, UNSW Maths Preliminaries

19/31

Linear Algebra

System of Linear Equations II

x1

x2

x3

y1

y2

y3

Domain RangeM

The above transformation is injective, but not surjective.

Wei Wang @ CSE, UNSW Maths Preliminaries

20/31

Linear Algebra

A Matrix Also Specifies a (Generalized) Coordinate System
I

Yet another interpretation

y = Mx =⇒ Iy = Mx.
The vector y wrt standard coordinate system, I, is the same
as x wrt the coordinate system defined by column vectors of
M. Think: why columns of M?

Wei Wang @ CSE, UNSW Maths Preliminaries

21/31

Linear Algebra

A Matrix Also Specifies a (Generalized) Coordinate System
II

Example for polynomials

I:


for 1 1 0 0for x 0 1 0

for x2 0 0 1


 =⇒ M:


for 3 3 −1 −4for x−1 0 1 5

for 2×2+5x−4 0 0 2


Let x =

[
1
−2
3

]
=⇒ Mx = I

[−7
13
6

]

Wei Wang @ CSE, UNSW Maths Preliminaries

22/31

Linear Algebra

Exercise I

What if y is given in the above example?

What does the following mean?
3 −1 −40 1 5

0 0 2




1 0 00 1 0

0 0 1


 =


3 −1 −40 1 5

0 0 2


Think about representing polynomials using the basis:
(x − 1)2, x2 − 1, x2 + 1.

Wei Wang @ CSE, UNSW Maths Preliminaries

23/31

Linear Algebra

Inner Product

THE binary operator – some kind of “similarity”

Type signature: vector× vector→ scalar: 〈x, y〉.
In Rn, usually called dot product: x · y def= x>y =


i xiyi .

For certain functions, 〈f , g〉 =
∫ b
a
f (t)g(t)dt. ⇒ leads to the

Hilbert Space

Properties / definitions for R:
conjugate symmetry: 〈x, y〉 = 〈y, x〉
linearity in the first argument: 〈ax + y, z〉 = a〈x, z〉+ 〈y, z〉
positive definitiveness: 〈x, x〉 ≥ 0; 〈x, x〉 ⇔ x = 0;

Generalizes many geometric concepts to vector spaces: angle
(orthogonal), projection, norm

〈sin nt, sinmt〉 = 0 within [−π, π] (m 6= n) ⇒ they are
orthogonal to each other.

C = A>B: Ci j = 〈Ai ,Bj〉
Special case: A>A.

Wei Wang @ CSE, UNSW Maths Preliminaries

24/31

Linear Algebra

Eigenvalues/vectors and Eigen Decomposition

“Eigen” means “characteristic of” (German)

A (right) eigenvector of a square matrix A is u such that
Au = λu.

Not all matrices have eigenvalues. Here, we only consider
“good” matrices. Not all eigenvalues need to be distinct.

Traditionally, we normalize u (such that u>u = 1).

We can use all eigenvectors of A to construct a matrix U (as
columns). Then AU = UΛ, or equivalently, A = UΛU−1.
This is the Eigen Decomposition.

We can interpret U as a transformation between two
coordinate systems. Note that vectors in U are not necessarily
orthogonal.
Λ as the scaling on each of the directions in the “new”
coordinate system.

Wei Wang @ CSE, UNSW Maths Preliminaries

25/31

Linear Algebra

Similar Matrices

Similar Matrix

Let A and B be two n × n matrix. A is similar to B (denoted
as A ∼ B) if there exists an invertible n × n matrix P such
that P−1AP = B.

Think: What does this mean?
Think of P as a change of basis transformation.
Relationship with the Eigen decomposition.

Similar matrices have the same value wrt many properties
(e.g., rank, trace, eigenvalues, determinant, etc.)

Wei Wang @ CSE, UNSW Maths Preliminaries

26/31

Linear Algebra

SVD

Singular Vector Decomposition

Let M be n × d (n ≥ d).
Reduced SVD: M = ÛΣ̂V> exists for any M, such that

Σ̂ is a diagonal matrix with diagonal elements σi (called
singular values) in decreasing order
Û consists of an (incomplete) set of basis vectors ui (left
singular vectors in Rn) (n × d : original space as M)
V̂ consists of a set of basis vectors vi (right singular vectors in
Rd) (d × d : reduced space)

Full SVD: M = UΣV>:
Add the remaining (n − d) basis vectors to Û (thus becomes
n × n).
Add the n − d rows of 0 to Λ̂ (thus becomes n × d).

Wei Wang @ CSE, UNSW Maths Preliminaries

27/31

Linear Algebra

Geometric Illustration of SVD

Geometric Meaning

Mvi = σiui

Wei Wang @ CSE, UNSW Maths Preliminaries

28/31

Linear Algebra

Graphical Illustration of SVD I

Figure: Reduced SVD vs Full SVD

M

(n×d)

= Û

(n×d)

Σ̂

(d×d)

V>

(d×d)

M

(n×d)

= U

(n×n)

Σ

(n×d)

V>

(d×d)

Wei Wang @ CSE, UNSW Maths Preliminaries

29/31

Linear Algebra

Graphical Illustration of SVD II

Meaning:

Columns of U are the basis of Rn

Rows of V> are the basis of Rd

Wei Wang @ CSE, UNSW Maths Preliminaries

30/31

Linear Algebra

SVD Applications I

Relationship between Singular Values and Eigenvalues

What are the eigenvalues of M>M?

Hint: M = UΣV> and U and V are unitary (i.e., rotations)

Related to PCA (Principle Component Analysis)

Wei Wang @ CSE, UNSW Maths Preliminaries

31/31

Linear Algebra

References and Further Reading I

Gaussian Quadrature:

Linear Algebra Review and Reference.
http://cs229.stanford.edu/section/cs229-linalg.pdf

Scipy LA tutorial. https://docs.scipy.org/doc/scipy/
reference/tutorial/linalg.html

We Recommend a Singular Value Decomposition.
http://www.ams.org/samplings/feature-column/fcarc-svd

Wei Wang @ CSE, UNSW Maths Preliminaries

http://cs229.stanford.edu/section/cs229-linalg.pdf
https://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html
https://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html
http://www.ams.org/samplings/feature-column/fcarc-svd

Linear Algebra