程序代写代做代考 python algorithm Hive Slide 1

Slide 1

BUSINESS SCHOOL

 Discipline of Business Analytics

QBUS6850 Team

 Topics covered

 Why studying machine learning

 Types of learning

 Linear algebra and matrix computation review

 References

 Chapter 1 (Alpaydin, 2014)

Learning Objectives

 Be able to distinguish two major types of learning (supervised

and unsupervised)

 Understand the notations of linear algebra

 Understand the basic operations of vectors/matrices, such as

the inner product of two vectors, the norm of a vector, transpose

of matrices, the matrix product, the product of matrices, matrix

rank and determinant

 Understand concepts such as the vector norm or length,

orthogonality of vectors and projecting a vector

 Be familiar with linear equations systems and matrix inverse

All is about Data and Analytics
• Every minute, Americans use 2,657,700GB of data.
• Every minute, Instagram users post 46,750 photos.
• Every minute, 15,220,700 texts are sent.
• Every minute, Google conducts 3,607,080 searches.

Guess please!

https://blog.linkedin.com/2017/december/7/the-fastest-growing-jobs-in-the-u-s-based-on-linkedin-data

 Data-driven decisions are more profitable
 Tradition: Decision-making relying on CEOs
 New trend: Data-driven, allowing for nonpersonal decision-making

 Machine Learning is changing how we do business
 The advanced algorithms saving time and resources
 Mitigating risks with better decisions

 Machine Learning provides better forecasting
 ML makes it possible to find hidden insights in the data
 ML makes it possible to extract patterns from vast amounts of data

Machine Learning for Business

https://www.infoworld.com/article/3259891/data-science/why-data-science-and-machine-learning-are-the-fastest-growing-jobs-in-the-us.html

Machine Learning for Business

Raw data
Business

requirements

Machine learning
algorithm

Data validation and cleansing

Which algorithm is
the best?

Implementation

Model Validation

Decision making

Implementation tool:
Python

Can I rely on the results
from the model?

Facilitate the business
decision making

How real businesses are using machine learning

https://www.bloomberg.com/news/articles/2016-02-03/google-search-chief-singhal-
to-retire-replaced-by-ai-manager

 Prediction: market demand prediction, price prediction

 Pattern Recognition

 Data Compression

 Outlier detection

 Recommendations

Applications

How real businesses are using machine learning

https://www.bloomberg.com/news/articles/2016-02-03/google-search-chief-singhal-to-retire-replaced-by-ai-manager

Arthur Samuel described machine learning it as: “the field of study that gives
computers the ability to learn without being explicitly programmed.”

Machine Learning aims to build algorithms that can learn from and make
predictions on data, and is evolved study of pattern recognition and
computational learning theory in artificial intelligence.

What is Machine Learning?

 Machine learning is using computers to analyse data.

 Wants the computers think like human.

 What is “learning”? Often, we do not want just to describe the data we

have, but be able to predict (yet) unseen data.

 With new data, the “learning” can be refreshed/updated

What is Machine Learning?

Traditional Statistics: You have a specific question about population.
E.g. What’s the Average Height of Australians?

 Expensive or impossible to collect data for entire population
 Collect a sample and use inference to say things about the feature of

population you want.
 Parameter = unknown feature of population of interest
 Estimator = Sample based estimate of a parameter

Current situation
Lots of data collected with no specific questions in mind.

Often, it would be quite easy to make a model that would describe already
known data. It is more difficult to predict unseen data (generalization).

Supervised Learning:

o In supervised learning, an imaginary “supervisor” tells us in the training
phase what is the correct response/target variable (t), given the feature (x).

o Dependent or outcome variable is given.
o t is distinguished from inputs x.
o Two major techniques here:
o Regression: t is quantitative, or continuous variable
o Classification: t is discrete or category variable
o Goal: prediction of t

Types of Learning

Supervised Learning &
Unsupervised Learning

Regression

Response variable t is a continuous variable.

Example: price of second hand cars.

t: car price; x: odometer reading. t = f(𝐱𝐱|𝛃𝛃); f � : a model; 𝛃𝛃: model parameters.

14.0

14.5

15.0

15.5

16.0

16.5

17.0

17.5

18.0

18.5

0.0 10.0 20.0 30.0 40.0 50.0 60.0

P
ric

Odometer

14.0

14.5

15.0

15.5

16.0

16.5

17.0

17.5

18.0

0.0 10.0 20.0 30.0 40.0 50.0 60.0
P

ric
e

Odometer

Is this a supervised or unsupervised learning example?

Example:
Data on credit card applicants

Question:
Should a new application be
granted a credit card?

Data:
Low-risk (+) and high-risk (-)
customers are labelled based their
income and savings.

Classification
Is this a supervised or unsupervised learning example?

𝑥𝑥2- saving

𝑥𝑥1-income

𝑥𝑥2- saving

𝑥𝑥1-income

Question:
Segment the customers
risk levels based on
income and saving

Data:
Labelled or not?

Is this a supervised or unsupervised learning example?

Clustering

𝑥𝑥2- saving

𝑥𝑥1-income

Semi-supervised learning

Often, finding a good data set is one of the most difficult tasks in developing
machine learning methods.

Useful Links:

UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html

UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html

Delve: http://www.cs.utoronto.ca/~delve/

Datasets Sources

http://www.ics.uci.edu/%7Emlearn/MLRepository.html
http://kdd.ics.uci.edu/summary.data.application.html
http://www.cs.utoronto.ca/%7Edelve/

Linear Algebra
(please read it through at home)

Vector
A vector is a collection of numbers (scalars) ordered by column (or row).
We assume vectors are of columns.

The symbol aT (a’s transpose) will be a row vector:









=

2
5

[ ]2,5=Tb

Vector a is one point
in a n-dimensional
space ℝn, with
coordinates provided
by the elements 𝑎𝑎𝑖𝑖

𝐛𝐛𝟏𝟏

𝐛𝐛𝟐𝟐
b

Geometric representation of b:
vector b is one point in a 2-
dimensional space ℝ2, and the
vector indicates the direction from
the origin to this point.

𝐚𝐚 =

𝑎𝑎1
𝑎𝑎2
⋮

𝑎𝑎𝑛𝑛−1
𝑎𝑎𝑛𝑛

𝐚𝐚𝑇𝑇 = 𝑎𝑎1,𝑎𝑎2, … ,𝑎𝑎𝑛𝑛−1,𝑎𝑎𝑛𝑛

Understanding Vector

Special cases
Zeros vector: 𝟎𝟎𝑛𝑛 = [0, 0, 0, … , 0]𝑇𝑇;

• All n components are 0’s

Unit vector: 𝐞𝐞𝑖𝑖 = [0, 0, 1, … , 0]𝑇𝑇;

• All components zeros except for the one at ith position (=1)

Ones vector: 𝟏𝟏𝑛𝑛 = [1, 1, 1, … , 1]𝑇𝑇;

• All n components are 1’s

Why it called unit vector?

Basic Operations of Vector

Equality of vectors:
a = b ⇐⇒ 𝑎𝑎𝑖𝑖 = 𝑏𝑏𝑖𝑖 for all i = 1, 2, …, n;

Multiplication by scalars:
let 𝜌𝜌 denote a scalar; 𝜌𝜌𝐚𝐚 is the vector with elements {𝜌𝜌𝑎𝑎𝑖𝑖} . E.g., Let 𝐚𝐚 =

[5, 2, 3]𝑇𝑇, then 0.5𝐚𝐚 = [0.5 ∗ 5, 0.5 ∗ 2, 0.5 ∗ 3]𝑇𝑇= [2.5, 1, 1.5]𝑇𝑇

Sum of two vectors:
Let a and b be two vectors with the same size n; their sum x = a + b is the

vector with elements 𝑐𝑐𝑖𝑖 = 𝑎𝑎𝑖𝑖+𝑏𝑏𝑖𝑖, e.g. Let 𝐚𝐚 = [5, 2, 3]𝑇𝑇and 𝐛𝐛 = [1,−11, 2]𝑇𝑇,

then 𝐜𝐜 = 𝐚𝐚 + 𝐛𝐛 = [5, 2, 3]𝑇𝑇 + [1,−11, 2]𝑇𝑇=[6,−9, 5]𝑇𝑇

Linear combination:
Let 𝐚𝐚 = [1, 2]𝑇𝑇and 𝐛𝐛 = [3, 1]𝑇𝑇, and let 𝜌𝜌1 and 𝜌𝜌2 denote be

two coefficients (scalars),

𝜌𝜌1𝐚𝐚 + 𝜌𝜌2𝐛𝐛

is their linear combination.

If 𝜌𝜌1 = 1 and 𝜌𝜌2 = 1, what is 𝜌𝜌1𝐚𝐚 + 𝜌𝜌2𝐛𝐛?

If 𝜌𝜌1 = 3 and 𝜌𝜌2 = 7, what is 𝜌𝜌1𝐚𝐚 + 𝜌𝜌2𝐛𝐛?

Basic Operations of Vector

Geometric representation of sum of two vectors (parallelogram rule) and
linear combination

𝐛𝐛1

𝐛𝐛2

a
b

𝒂𝒂 + 𝒃𝒃

https://en.wikipedia.org/wiki/Parallelogram

The inner product between two n-dimensional vector a and b is defined
as:

∑
=

=>=< n i ii T ba 1 , baba Properties 1. (𝜌𝜌𝐚𝐚)𝑇𝑇𝐛𝐛 = 𝜌𝜌(𝐚𝐚𝑇𝑇𝐛𝐛) 2. 𝐚𝐚𝑇𝑇𝐛𝐛 = 𝐛𝐛𝑇𝑇𝐚𝐚 3. 𝐚𝐚𝑇𝑇 𝐛𝐛 + 𝐜𝐜 = 𝐚𝐚𝑇𝑇𝐛𝐛+𝐚𝐚𝑇𝑇𝐜𝐜 Vector Inner Product If 𝐚𝐚 = [5, 2, 3]𝑇𝑇and 𝐛𝐛 = [1,−11, 2]𝑇𝑇, < 𝐚𝐚,𝐛𝐛 > =?

Vector Norm or Length
By Pythagoras theorem, the norm of vector a is the square root of

the inner product of a with itself:

This is the distance from the origin to the point a or the length of the vector.
The (normalized) vector 𝒂𝒂/||𝒂𝒂|| has unit length.

0||||
2/1

2 ≥







== ∑

i
i

T aaaa

https://en.wikipedia.org/wiki/Pythagorean_theorem

𝑎𝑎2 + 𝑏𝑏2 =?

BC=90o

Euclidean Distance and Orthogonality

The distance between the vectors 𝐱𝐱𝑖𝑖 and 𝐱𝐱𝑗𝑗 is the norm of
the difference vector 𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗:

Orthogonality: two vectors are orthogonal, a ⊥ b, if and only if their
inner product is zero, 𝐚𝐚T𝐛𝐛 = 𝟎𝟎.

Example?

𝑑𝑑𝑖𝑖𝑗𝑗 = 𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗 = 𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗
𝑇𝑇
𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗 = �

𝑘𝑘=1

𝑑𝑑

𝑥𝑥𝑖𝑖𝑘𝑘 − 𝑥𝑥𝑗𝑗𝑘𝑘
2

1/2

≥ 0









−

=
1

1
b








=

1
1

a ||𝐚𝐚|| = ||𝐛𝐛|| = 𝟐𝟐 𝐚𝐚T𝐛𝐛 = 𝟎𝟎









−

=
2.0

6.0
d








=

3
1

c 𝐜𝐜
T𝐝𝐝 =?

Orthogonality?
||𝒄𝒄|| =?
||𝐝𝐝|| =?

Geometric Representation

0||||
2/1

2 ≥







== ∑

i
i

T aaaa









=

a
a

a 







=

b
b

𝐛𝐛𝟏𝟏

𝐛𝐛𝟐𝟐

Inner Product Geometric Interpretation

Suppose we have a vector c that is orthogonal to b,

by the parallelogram law, 𝐚𝐚 = 𝐜𝐜 + ρ𝐛𝐛.

𝜌𝜌 here is a scala.

𝒃𝒃𝟏𝟏

𝒃𝒃𝟐𝟐

Vector 𝐝𝐝 = 𝜌𝜌𝒃𝒃 is called the orthogonal projection of a onto b.

∈ ℝ

Connecting to inner product

𝐚𝐚 = 𝐜𝐜 + ρ𝐛𝐛

𝐚𝐚𝑻𝑻 = 𝐜𝐜𝑻𝑻 + 𝜌𝜌𝐛𝐛𝑇𝑇

𝐚𝐚𝑻𝑻𝐛𝐛 = 𝐜𝐜𝑻𝑻𝐛𝐛 + 𝜌𝜌𝐛𝐛𝑇𝑇𝐛𝐛 = 𝟎𝟎 + 𝜌𝜌||𝒃𝒃||2 = 𝜌𝜌||𝒃𝒃||2

Inner Product Geometric Interpretation

Example









=

a
a

a 







=

1
2

b 






−
=

2
1

Vector c is orthogonal to b.
How to check orthogonality?

Suppose ρ = 0.7;

-1

𝐚𝐚 = 𝐜𝐜 + ρ𝐛𝐛









=








+
+−

=
7.2
4.0

1*7.02
2*7.01

Vector 𝐝𝐝 = 𝜌𝜌𝐛𝐛 is called the orthogonal projection of a onto b.

-1

𝐝𝐝









=

1
2

bρ = 0.7 







=

7.0
4.1

Let’s test 𝐚𝐚𝑻𝑻𝐛𝐛 = 𝜌𝜌||𝒃𝒃||2

𝐚𝐚𝑻𝑻𝐛𝐛 = 0.4 ∗ 2 + 2.7 ∗ 1 = 3.5

𝐚𝐚𝑻𝑻𝐛𝐛 = 𝜌𝜌||𝒃𝒃||2 = 0.7 ∗ 22 + 12
2

= 3.5

||𝐝𝐝|| = 1.42 + 0.72 = 1.565

𝐚𝐚𝑻𝑻𝐛𝐛 = ||𝐝𝐝||* ||𝐛𝐛||= 1.42 + 0.72∗ 22 + 12 = 3.5

Will be used in SVM later

||𝐝𝐝|| = 𝜌𝜌||𝒃𝒃||

A matrix is a rectangular array of numbers (scalars) for which operations such
as addition and multiplication are defined. It is a rectangular (𝑁𝑁 × 𝑑𝑑) or two-
dimensional array of scalars (numbers), represented as:

Basic Calculation of Matrix

In a typical data matrix, the index 𝑖𝑖 = 1, 2, … ,𝑁𝑁 refers to the statistical units/training
examples, and the index 𝑗𝑗 = 1, 2, … ,𝑑𝑑 to the variables or features.

Can also write:

𝐗𝐗 = [𝑥𝑥𝑖𝑖𝑗𝑗]

ℝ𝑁𝑁×𝑑𝑑

𝐗𝐗 =

𝑥𝑥11 𝑥𝑥12 …
𝑥𝑥21 𝑥𝑥22 …
⋮ ⋮ ⋱

𝑥𝑥1𝑗𝑗 …
𝑥𝑥2𝑗𝑗 …
⋮ ⋮

𝑥𝑥1𝑑𝑑
𝑥𝑥2𝑑𝑑
⋮

𝑥𝑥𝑖𝑖1 𝑥𝑥𝑖𝑖2 …
⋮ ⋮ ⋮

𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱

𝑥𝑥𝑖𝑖𝑑𝑑
⋮

𝑥𝑥𝑁𝑁1 𝑥𝑥𝑁𝑁2 … 𝑥𝑥𝑁𝑁𝑗𝑗 … 𝑥𝑥𝑁𝑁𝑑𝑑

Square matrix is a matrix with the same number of row and column
numbers.

Matrix



















44434241

34333231

22232221

14131211

aaaa
aaaa
aaaa
aaaa

a ℝ
4*4

A column vector of size 𝑁𝑁 can be represented as 𝑁𝑁 × 1 matrix

A row vector of size 𝑑𝑑 can be represented as 1 × 𝑑𝑑 matrix.

Or we can partition as below,
where each column is a 𝑁𝑁 ×
1 column vector

We can represent X as a partitioned matrix
whose generic block is the 1 × 𝑑𝑑 row
vector

Each row is a
1 × d row vector

originally 𝐱𝐱𝐢𝐢 is a
column vector

𝐗𝐗 =

𝐱𝐱1
𝑇𝑇

𝐱𝐱2
𝑇𝑇

⋮
𝐱𝐱𝑖𝑖
𝑇𝑇

⋮
𝐱𝐱𝑁𝑁
𝑇𝑇

𝐗𝐗 = 𝐱𝐱1, 𝐱𝐱2, … , 𝐱𝐱𝑗𝑗 , … , 𝐱𝐱𝑑𝑑

Matrix transpose: transposition yields the 𝑁𝑁 × 𝑑𝑑 matrix with rows
and columns interchanged

Matrix Transpose

𝐗𝐗 =

𝑥𝑥11 𝑥𝑥12 …
𝑥𝑥21 𝑥𝑥22 …
⋮ ⋮ ⋱

𝑥𝑥1𝑗𝑗 …
𝑥𝑥2𝑗𝑗 …
⋮ ⋮

𝑥𝑥1𝑑𝑑
𝑥𝑥2𝑑𝑑
⋮

𝑥𝑥𝑖𝑖1 𝑥𝑥𝑖𝑖2 …
⋮ ⋮ ⋮

𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱

𝑥𝑥𝑖𝑖𝑑𝑑
⋮

𝑥𝑥𝑁𝑁1 𝑥𝑥𝑁𝑁2 … 𝑥𝑥𝑁𝑁𝑗𝑗 … 𝑥𝑥𝑁𝑁𝑑𝑑

𝐗𝐗𝑇𝑇 =

𝑥𝑥11 𝑥𝑥21 …
𝑥𝑥12 𝑥𝑥22 …
⋮ ⋮ ⋱

𝑥𝑥𝑖𝑖1 …
𝑥𝑥𝑖𝑖2 …
⋮ ⋮

𝑥𝑥𝑁𝑁1
𝑥𝑥𝑁𝑁2
⋮

𝑥𝑥1𝑗𝑗 𝑥𝑥2𝑗𝑗 …
⋮ ⋮ ⋮

𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱

𝑥𝑥𝑁𝑁𝑗𝑗
⋮

𝑥𝑥1𝑑𝑑 𝑥𝑥2𝑑𝑑 … 𝑥𝑥𝑖𝑖𝑑𝑑 … 𝑥𝑥𝑁𝑁𝑑𝑑

Matrix Product

Let A be an 𝑵𝑵 × 𝒑𝒑matrix whose ith row is the 1 × 𝒑𝒑 vector 𝐚𝐚𝑖𝑖
𝑇𝑇,

Let B be an 𝒑𝒑 × 𝒅𝒅 matrix whose jth column is the 𝒑𝒑 × 1 vector 𝐛𝐛𝑗𝑗, so that

The matrix product C = AB, where A pre-
multiplies B, is the 𝑵𝑵 × 𝒅𝒅matrix with elements

𝐴𝐴 =

𝐚𝐚1
𝑇𝑇

𝐚𝐚2
𝑇𝑇

⋮
𝐚𝐚𝑖𝑖
𝑇𝑇

⋮
𝐚𝐚𝑁𝑁
𝑇𝑇

𝐁𝐁 = 𝐛𝐛1,𝐛𝐛2, … ,𝐛𝐛𝑗𝑗 , … ,𝐛𝐛𝑑𝑑

𝑐𝑐𝑖𝑖𝑗𝑗 = 𝐚𝐚𝑖𝑖
𝑇𝑇𝐛𝐛𝑗𝑗 = �

𝑘𝑘=1

𝑝𝑝

𝑎𝑎𝑖𝑖𝑘𝑘𝑏𝑏𝑘𝑘𝑗𝑗 , 𝑖𝑖 = 1, … ,𝑁𝑁; 𝑗𝑗 = 1, … ,𝑑𝑑

Matrix Product
How matrix product works?



















35
21
07
13

A 






 −
=

625
931

B
The number of columns of A is 2 and the
number of rows of B is 2. We can do product C
= AB which has 4 rows (= A’s row number) and
3 columns (= B’s column number):



















434241

333231

232221

131211

ccc
ccc
ccc
ccc

851131111 =×+×== ba
Tc

C11 is the inner product of first row
of A and first column of B

C32 =?









=

1
3

1a 







=

0
7

Properties of Matrix
Not any two matrices have a product. You must make sure that the
number of columns of the first matrix is EQUAL to the number of rows of
the second matrix. Hence in general Matrix product is not commutative:
AB is unequal to BA;
So in previous example: BA is not defined.

We do have AB=BA for some appropriate matrices A and B.

If A is an 𝒎𝒎 × 𝒑𝒑matrix, notice the difference between 𝐀𝐀𝐓𝐓𝐀𝐀 (𝒑𝒑 × 𝒑𝒑matrix
of crossproducts) and 𝐀𝐀𝐀𝐀𝐓𝐓 (size 𝒎𝒎 × 𝒎𝒎).

Matrix Special Cases

A square matrix has row number equals to the column number: 𝑵𝑵 = 𝒅𝒅
A square matrix A is symmetric if AT = A

Diagonal matrix: a square matrix with all zeros on the nondiagonal positions

Can you have Diagonal
matrix if 𝑁𝑁 is unequal to 𝑑𝑑?

Identity matrix (𝐈𝐈𝑵𝑵) of order 𝑁𝑁 is a diagonal matrix with all 𝑑𝑑𝑖𝑖 = 1

𝐷𝐷 =

𝑑𝑑1 0 …
0 𝑑𝑑2 ⋱
⋮ ⋱ ⋱

0
0
⋱

0
0
⋮

0 ⋱ ⋱ 𝑑𝑑𝑁𝑁−1 0
0 0 … 0 𝑑𝑑𝑁𝑁

= diag(𝑑𝑑1,𝑑𝑑2, … ,𝑑𝑑N−1,𝑑𝑑𝑁𝑁)

Properties of Matrix
If A is 𝑁𝑁 × 𝑑𝑑, then

Scalar matrix:

Quadratic form: Let A be an d dimensional symmetric square
matrix and x be an 𝑑𝑑 × 1 vector. Below scalar is called a quadratic
form.

Semi-positive (nonnegative) definite

Positive definite Examples?

Outer product: If x is an 𝑁𝑁 × 1 vector and y is an 𝑑𝑑 × 1 vector, the
outer product xyT is an 𝑁𝑁 × 𝑑𝑑 matrix.

𝐈𝐈𝑁𝑁𝐀𝐀 = 𝐀𝐀 and 𝐀𝐀 𝐈𝐈𝑑𝑑 = 𝐀𝐀

𝐱𝐱𝑇𝑇𝐀𝐀𝐱𝐱

𝜌𝜌𝐈𝐈𝑑𝑑

𝐱𝐱𝑇𝑇𝐀𝐀𝐱𝐱 ≥ 0

𝐱𝐱𝑇𝑇𝐀𝐀𝐱𝐱 > 0

�𝐗𝐗 =
1
𝑁𝑁
𝐗𝐗𝑇𝑇𝐢𝐢𝑁𝑁 =

�̅�𝑥1
�̅�𝑥2
⋮
�̅�𝑥𝑗𝑗
⋮
�̅�𝑥𝑑𝑑

Matrix Mean Vector
Let 𝑿𝑿 ∈ ℝ𝑁𝑁×𝑑𝑑 be a matrix of size 𝑁𝑁 × 𝑑𝑑 and 𝒊𝒊𝑁𝑁 = [1, 1, … , 1]𝑇𝑇 be 𝑁𝑁 ×
1 vector of all ones.

𝑑𝑑 × 1

1 × 𝑑𝑑

𝑑𝑑 × 𝑁𝑁

𝑁𝑁 × 1

Mean of feature 1

Mean of feature 2

d features, each has
N training examples

𝐗𝐗 =

𝑥𝑥11 𝑥𝑥12 …
𝑥𝑥21 𝑥𝑥22 …
⋮ ⋮ ⋱

𝑥𝑥1𝑗𝑗 …
𝑥𝑥2𝑗𝑗 …
⋮ ⋮

𝑥𝑥1𝑑𝑑
𝑥𝑥2𝑑𝑑
⋮

𝑥𝑥𝑖𝑖1 𝑥𝑥𝑖𝑖2 …
⋮ ⋮ ⋮

𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱

𝑥𝑥𝑖𝑖𝑑𝑑
⋮

𝑥𝑥𝑁𝑁1 𝑥𝑥𝑁𝑁2 … 𝑥𝑥𝑁𝑁𝑗𝑗 … 𝑥𝑥𝑁𝑁𝑑𝑑

𝐗𝐗𝑇𝑇 =

𝑥𝑥11 𝑥𝑥21 …
𝑥𝑥12 𝑥𝑥22 …
⋮ ⋮ ⋱

𝑥𝑥𝑖𝑖1 …
𝑥𝑥𝑖𝑖2 …
⋮ ⋮

𝑥𝑥𝑁𝑁1
𝑥𝑥𝑁𝑁2
⋮

𝑥𝑥1𝑗𝑗 𝑥𝑥2𝑗𝑗 …
⋮ ⋮ ⋮

𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱

𝑥𝑥𝑁𝑁𝑗𝑗
⋮

𝑥𝑥1𝑑𝑑 𝑥𝑥2𝑑𝑑 … 𝑥𝑥𝑖𝑖𝑑𝑑 … 𝑥𝑥𝑁𝑁𝑑𝑑

�𝐗𝐗𝑇𝑇 =
1
𝑁𝑁
𝐢𝐢𝑁𝑁
𝑇𝑇𝐗𝐗 = [�̅�𝑥1, �̅�𝑥2, … . �̅�𝑥𝑗𝑗 , … , �̅�𝑥𝑑𝑑]

𝑠𝑠𝑗𝑗
2 =

1
𝑁𝑁 − 1

�
𝑖𝑖=1

𝑁𝑁

𝑥𝑥𝑖𝑖𝑗𝑗 − �̅�𝑥𝑗𝑗
2

, 𝑠𝑠𝑗𝑗𝑘𝑘 =
1

𝑁𝑁 − 1
�
𝑖𝑖=1

𝑁𝑁

𝑥𝑥𝑖𝑖𝑗𝑗 − �̅�𝑥𝑗𝑗 𝑥𝑥𝑖𝑖𝑘𝑘 − �̅�𝑥𝑘𝑘

Sample Variance-Covariance Matrix

Is S symmetric?

Suppose X is a matrix of size 𝑁𝑁 × 𝑑𝑑: 𝑁𝑁 training examples and d features

Feature 𝑗𝑗 variance Sample covariance between feature 𝑗𝑗 and 𝑘𝑘

𝑆𝑆12 =?
Mean of feature j

𝐒𝐒 =

𝑠𝑠1
2 𝑠𝑠12 …

𝑠𝑠21 𝑠𝑠2
2 …

⋮ ⋮ ⋱

𝑠𝑠1𝑗𝑗 …
𝑠𝑠2𝑗𝑗 …
⋮ ⋮

𝑠𝑠1𝑑𝑑
𝑠𝑠2𝑑𝑑
⋮

𝑠𝑠𝑗𝑗1 𝑠𝑠𝑗𝑗2 …
⋮ ⋮ ⋮

𝑠𝑠𝑗𝑗
2 …
⋮ ⋱

𝑠𝑠𝑗𝑗𝑑𝑑
⋮

𝑠𝑠𝑑𝑑1 𝑠𝑠𝑑𝑑2 … 𝑠𝑠𝑑𝑑𝑗𝑗 … 𝑠𝑠𝑑𝑑
2

Matrix Rank
In linear algebra the rank of a matrix A is the dimension of the vector space
generated (or spanned) by its columns. This is the same as the dimension of
the space spanned by its rows.

The column and row rank are coincident and so we can define the rank of
the matrix as the maximum number of linearly independent vectors
(those forming either the rows or the columns) and denote it by r(A).
Obviously r(A) ≤ min(N, d).
















=

963
852
741
















−−→+−→

















−−
−−→+−→

















−−
→+−→

















000
630

741
2

1260
630

741
2

1260
852
741

3
963
852
741

322131 rrrrrr

Determinant
If A is 𝑵𝑵 × 𝑵𝑵, its determinant, det(A) or |A|, is a scalar, whose absolute value
measures the volume of the parallelogram delimited in d-dimensional space by
the columns of A.

For the identity matrix

For the diagonal matrix

Moreover, if ρ is a scalar

• If the columns (rows) of A are linearly dependent, so that rank(A) < N, then |A| = 0; • |AB| = |A| · |B|; • |AT | = |A|. |𝐈𝐈𝑁𝑁| = 1 𝐃𝐃 = 𝑑𝑑1𝑑𝑑2 …𝑑𝑑𝑁𝑁 = � 𝑛𝑛=1 𝑁𝑁 𝑑𝑑𝑛𝑛 𝜌𝜌𝐃𝐃 = (𝜌𝜌𝑑𝑑1)(𝜌𝜌𝑑𝑑2) … (𝜌𝜌𝑑𝑑𝑁𝑁) = 𝜌𝜌𝑁𝑁 𝐃𝐃 Matrix Determinant The general expression for the determinant is the following Laplace (cofactor) expansion       − = 25 71 A ( ) ( ) 37352)5()1(721 |5|17|2|11|| 2111 =+=−×−×+×= −×−×+×−×= ++A where 𝐀𝐀ij is the submatrix obtained from 𝐀𝐀 by removing the ith row and the jth column; |𝐀𝐀ij| is called a minor of 𝐀𝐀 and (−1) 𝑖𝑖+𝑗𝑗|𝐀𝐀ij| is called cofactor. 𝐀𝐀 = � 𝑗𝑗=1 𝑁𝑁 𝑎𝑎𝑖𝑖𝑗𝑗 −1 𝑖𝑖+𝑗𝑗 𝐀𝐀𝑖𝑖𝑗𝑗 , for any fixed 𝑖𝑖 = 1, 2, … ,𝑁𝑁           = 333231 232221 131211 aaa aaa aaa a Matrix (3×3) Determinant 3231 2221 13 3331 2321 12 3332 2322 11 333231 232221 131211 || aa aa a aa aa a aa aa a aaa aaa aaa +−==a The determinant is only defined for square matrices. For non-square matrices, there’s no determinant value. Matrix Trace The trace of a square matrix is the sum of its diagonal elements. If A is 𝑁𝑁 × 𝑁𝑁, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )BAtrABtr AtrAtr BtrAtrBAtr AtrAtr = = +=+ = T ρρ tr 𝐀𝐀 ∶= � 𝑖𝑖=1 𝑁𝑁 𝑎𝑎𝑖𝑖𝑖𝑖 Linear Equations Systems Consider the system of 𝑛𝑛 linear equations in 𝑛𝑛 unknown, where A is a known 𝒏𝒏 × 𝒏𝒏 coefficients matrix and b a known 𝒏𝒏 × 𝟏𝟏 vector: bAx = A non homogeneous system admits a unique solution if and only if |A| is unequal to 0, or equivalently rank(A) = n. In such case, the solution can be written as bAx 1−= 3 unknowns; 3 equations Matrix Inverse Let A be a square matrix of dimension 𝑛𝑛 with full rank: rank(A) = 𝑛𝑛. The inverse matrix is the matrix X which when pre-multiplied or post-multiplied by A returns the identity matrix nn IAXIXA == , If such X exists, then it is unique. We can write 𝐗𝐗 = 𝐀𝐀−𝟏𝟏, called the inverse of A. For a diagonal matrix, the computation of the inverse is immediate 11, −− == AXXA 𝐷𝐷−1 = 1/𝑑𝑑1 0 … 0 1/𝑑𝑑2 ⋯ 0 0 ⋱ ⋯ 0 0 … 0 0 ⋯ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋯ 0 0 ⋯ ⋱ ⋮ ⋮ 0 1/𝑑𝑑𝑛𝑛−1 0 ⋯ 0 1/𝑑𝑑𝑛𝑛 Matrix Inverse Example We now illustrate the 2 × 2 case. From the definition of an inverse, 𝐀𝐀𝐗𝐗 = 𝐈𝐈𝟐𝟐, it follows       =            10 01 2221 1211 2221 1211 xx xx aa aa This yields a system of 4 equations in 4 unknowns: 1 0 0 1 22221221 21221121 22121211 21121111 =+ =+ =+ =+ xaxa xaxa xaxa xaxa       − − − === − 1121 1222 21122211 *1 1 || 1 aa aa aaaa A A AX Recall matrix determinant If |A|=0 or rank(A) < n, then its inverse does not exist. where 𝐀𝐀∗ is known as the adjoint matrix of 𝐀𝐀, with elements given by the cofactors of 𝐀𝐀, e.g., ( ) ||1* ij ji jia A +−= Matrix Inverse Slide Number 1 Slide Number 2 Slide Number 3 Slide Number 4 Slide Number 5 Slide Number 6 Slide Number 7 Slide Number 8 Slide Number 9 Slide Number 10 Slide Number 11 Slide Number 12 Slide Number 13 Slide Number 14 Slide Number 15 Slide Number 16 Slide Number 17 Slide Number 18 Slide Number 19 Slide Number 20 Slide Number 21 Slide Number 22 Slide Number 23 Slide Number 24 Slide Number 25 Slide Number 26 Slide Number 27 Slide Number 28 Slide Number 29 Slide Number 30 Slide Number 31 Slide Number 32 Slide Number 33 Slide Number 34 Slide Number 35 Slide Number 36 Slide Number 37 Slide Number 38 Slide Number 39 Slide Number 40 Slide Number 41 Slide Number 42 Slide Number 43 Slide Number 44 Slide Number 45 Slide Number 46 Slide Number 47 Slide Number 48 Slide Number 49 Slide Number 50 Slide Number 51 Slide Number 52 Slide Number 53

Related Posts