Slide 1
BUSINESS SCHOOL
Discipline of Business Analytics
QBUS6850 Team
Topics covered
Why studying machine learning
Types of learning
Linear algebra and matrix computation review
References
Chapter 1 (Alpaydin, 2014)
Learning Objectives
Be able to distinguish two major types of learning (supervised
and unsupervised)
Understand the notations of linear algebra
Understand the basic operations of vectors/matrices, such as
the inner product of two vectors, the norm of a vector, transpose
of matrices, the matrix product, the product of matrices, matrix
rank and determinant
Understand concepts such as the vector norm or length,
orthogonality of vectors and projecting a vector
Be familiar with linear equations systems and matrix inverse
All is about Data and Analytics
• Every minute, Americans use 2,657,700GB of data.
• Every minute, Instagram users post 46,750 photos.
• Every minute, 15,220,700 texts are sent.
• Every minute, Google conducts 3,607,080 searches.
Guess please!
https://blog.linkedin.com/2017/december/7/the-fastest-growing-jobs-in-the-u-s-based-on-linkedin-data
Data-driven decisions are more profitable
Tradition: Decision-making relying on CEOs
New trend: Data-driven, allowing for nonpersonal decision-making
Machine Learning is changing how we do business
The advanced algorithms saving time and resources
Mitigating risks with better decisions
Machine Learning provides better forecasting
ML makes it possible to find hidden insights in the data
ML makes it possible to extract patterns from vast amounts of data
Machine Learning for Business
https://www.infoworld.com/article/3259891/data-science/why-data-science-and-machine-learning-are-the-fastest-growing-jobs-in-the-us.html
Machine Learning for Business
Raw data
Business
requirements
Machine learning
algorithm
Data validation and cleansing
Which algorithm is
the best?
Implementation
Model Validation
Decision making
Implementation tool:
Python
Can I rely on the results
from the model?
Facilitate the business
decision making
https://www.bloomberg.com/news/articles/2016-02-03/google-search-chief-singhal-
to-retire-replaced-by-ai-manager
Prediction: market demand prediction, price prediction
Pattern Recognition
Data Compression
Outlier detection
Recommendations
Applications
https://www.bloomberg.com/news/articles/2016-02-03/google-search-chief-singhal-to-retire-replaced-by-ai-manager
Arthur Samuel described machine learning it as: “the field of study that gives
computers the ability to learn without being explicitly programmed.”
Machine Learning aims to build algorithms that can learn from and make
predictions on data, and is evolved study of pattern recognition and
computational learning theory in artificial intelligence.
What is Machine Learning?
Machine learning is using computers to analyse data.
Wants the computers think like human.
What is “learning”? Often, we do not want just to describe the data we
have, but be able to predict (yet) unseen data.
With new data, the “learning” can be refreshed/updated
What is Machine Learning?
Traditional Statistics: You have a specific question about population.
E.g. What’s the Average Height of Australians?
Expensive or impossible to collect data for entire population
Collect a sample and use inference to say things about the feature of
population you want.
Parameter = unknown feature of population of interest
Estimator = Sample based estimate of a parameter
Current situation
Lots of data collected with no specific questions in mind.
Often, it would be quite easy to make a model that would describe already
known data. It is more difficult to predict unseen data (generalization).
Supervised Learning:
o In supervised learning, an imaginary “supervisor” tells us in the training
phase what is the correct response/target variable (t), given the feature (x).
o Dependent or outcome variable is given.
o t is distinguished from inputs x.
o Two major techniques here:
o Regression: t is quantitative, or continuous variable
o Classification: t is discrete or category variable
o Goal: prediction of t
Types of Learning
Supervised Learning &
Unsupervised Learning
Regression
Response variable t is a continuous variable.
Example: price of second hand cars.
t: car price; x: odometer reading. t = f(𝐱𝐱|𝛃𝛃); f � : a model; 𝛃𝛃: model parameters.
14.0
14.5
15.0
15.5
16.0
16.5
17.0
17.5
18.0
18.5
0.0 10.0 20.0 30.0 40.0 50.0 60.0
P
ric
e
Odometer
14.0
14.5
15.0
15.5
16.0
16.5
17.0
17.5
18.0
0.0 10.0 20.0 30.0 40.0 50.0 60.0
P
ric
e
Odometer
Is this a supervised or unsupervised learning example?
Example:
Data on credit card applicants
Question:
Should a new application be
granted a credit card?
Data:
Low-risk (+) and high-risk (-)
customers are labelled based their
income and savings.
Classification
Is this a supervised or unsupervised learning example?
𝑥𝑥2- saving
𝑥𝑥1-income
𝑥𝑥2- saving
𝑥𝑥1-income
Question:
Segment the customers
risk levels based on
income and saving
Data:
Labelled or not?
Is this a supervised or unsupervised learning example?
Clustering
𝑥𝑥2- saving
𝑥𝑥1-income
Semi-supervised learning
Often, finding a good data set is one of the most difficult tasks in developing
machine learning methods.
Useful Links:
UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html
Delve: http://www.cs.utoronto.ca/~delve/
Datasets Sources
http://www.ics.uci.edu/%7Emlearn/MLRepository.html
http://kdd.ics.uci.edu/summary.data.application.html
http://www.cs.utoronto.ca/%7Edelve/
Linear Algebra
(please read it through at home)
Vector
A vector is a collection of numbers (scalars) ordered by column (or row).
We assume vectors are of columns.
The symbol aT (a’s transpose) will be a row vector:
=
2
5
b
[ ]2,5=Tb
Vector a is one point
in a n-dimensional
space ℝn, with
coordinates provided
by the elements 𝑎𝑎𝑖𝑖
𝐛𝐛𝟏𝟏
𝐛𝐛𝟐𝟐
b
Geometric representation of b:
vector b is one point in a 2-
dimensional space ℝ2, and the
vector indicates the direction from
the origin to this point.
𝐚𝐚 =
𝑎𝑎1
𝑎𝑎2
⋮
𝑎𝑎𝑛𝑛−1
𝑎𝑎𝑛𝑛
𝐚𝐚𝑇𝑇 = 𝑎𝑎1,𝑎𝑎2, … ,𝑎𝑎𝑛𝑛−1,𝑎𝑎𝑛𝑛
Understanding Vector
Special cases
Zeros vector: 𝟎𝟎𝑛𝑛 = [0, 0, 0, … , 0]𝑇𝑇;
• All n components are 0’s
Unit vector: 𝐞𝐞𝑖𝑖 = [0, 0, 1, … , 0]𝑇𝑇;
• All components zeros except for the one at ith position (=1)
Ones vector: 𝟏𝟏𝑛𝑛 = [1, 1, 1, … , 1]𝑇𝑇;
• All n components are 1’s
Why it called unit vector?
Basic Operations of Vector
Equality of vectors:
a = b ⇐⇒ 𝑎𝑎𝑖𝑖 = 𝑏𝑏𝑖𝑖 for all i = 1, 2, …, n;
Multiplication by scalars:
let 𝜌𝜌 denote a scalar; 𝜌𝜌𝐚𝐚 is the vector with elements {𝜌𝜌𝑎𝑎𝑖𝑖} . E.g., Let 𝐚𝐚 =
[5, 2, 3]𝑇𝑇, then 0.5𝐚𝐚 = [0.5 ∗ 5, 0.5 ∗ 2, 0.5 ∗ 3]𝑇𝑇= [2.5, 1, 1.5]𝑇𝑇
Sum of two vectors:
Let a and b be two vectors with the same size n; their sum x = a + b is the
vector with elements 𝑐𝑐𝑖𝑖 = 𝑎𝑎𝑖𝑖+𝑏𝑏𝑖𝑖, e.g. Let 𝐚𝐚 = [5, 2, 3]𝑇𝑇and 𝐛𝐛 = [1,−11, 2]𝑇𝑇,
then 𝐜𝐜 = 𝐚𝐚 + 𝐛𝐛 = [5, 2, 3]𝑇𝑇 + [1,−11, 2]𝑇𝑇=[6,−9, 5]𝑇𝑇
Linear combination:
Let 𝐚𝐚 = [1, 2]𝑇𝑇and 𝐛𝐛 = [3, 1]𝑇𝑇, and let 𝜌𝜌1 and 𝜌𝜌2 denote be
two coefficients (scalars),
𝜌𝜌1𝐚𝐚 + 𝜌𝜌2𝐛𝐛
is their linear combination.
If 𝜌𝜌1 = 1 and 𝜌𝜌2 = 1, what is 𝜌𝜌1𝐚𝐚 + 𝜌𝜌2𝐛𝐛?
If 𝜌𝜌1 = 3 and 𝜌𝜌2 = 7, what is 𝜌𝜌1𝐚𝐚 + 𝜌𝜌2𝐛𝐛?
Basic Operations of Vector
Geometric representation of sum of two vectors (parallelogram rule) and
linear combination
𝐛𝐛1
𝐛𝐛2
a
b
𝒂𝒂 + 𝒃𝒃
https://en.wikipedia.org/wiki/Parallelogram
The inner product between two n-dimensional vector a and b is defined
as:
∑
=
=>=< n i ii T ba 1 , baba Properties 1. (𝜌𝜌𝐚𝐚)𝑇𝑇𝐛𝐛 = 𝜌𝜌(𝐚𝐚𝑇𝑇𝐛𝐛) 2. 𝐚𝐚𝑇𝑇𝐛𝐛 = 𝐛𝐛𝑇𝑇𝐚𝐚 3. 𝐚𝐚𝑇𝑇 𝐛𝐛 + 𝐜𝐜 = 𝐚𝐚𝑇𝑇𝐛𝐛+𝐚𝐚𝑇𝑇𝐜𝐜 Vector Inner Product If 𝐚𝐚 = [5, 2, 3]𝑇𝑇and 𝐛𝐛 = [1,−11, 2]𝑇𝑇, < 𝐚𝐚,𝐛𝐛 > =?
Vector Norm or Length
By Pythagoras theorem, the norm of vector a is the square root of
the inner product of a with itself:
This is the distance from the origin to the point a or the length of the vector.
The (normalized) vector 𝒂𝒂/||𝒂𝒂|| has unit length.
0||||
2/1
1
2 ≥
== ∑
=
n
i
i
T aaaa
https://en.wikipedia.org/wiki/Pythagorean_theorem
a
A
c
𝑎𝑎2 + 𝑏𝑏2 =?
BC=90o
b
Euclidean Distance and Orthogonality
The distance between the vectors 𝐱𝐱𝑖𝑖 and 𝐱𝐱𝑗𝑗 is the norm of
the difference vector 𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗:
Orthogonality: two vectors are orthogonal, a ⊥ b, if and only if their
inner product is zero, 𝐚𝐚T𝐛𝐛 = 𝟎𝟎.
Example?
𝑑𝑑𝑖𝑖𝑗𝑗 = 𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗 = 𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗
𝑇𝑇
𝐱𝐱𝑖𝑖 − 𝐱𝐱𝑗𝑗 = �
𝑘𝑘=1
𝑑𝑑
𝑥𝑥𝑖𝑖𝑘𝑘 − 𝑥𝑥𝑗𝑗𝑘𝑘
2
1/2
≥ 0
−
=
1
1
b
=
1
1
a ||𝐚𝐚|| = ||𝐛𝐛|| = 𝟐𝟐 𝐚𝐚T𝐛𝐛 = 𝟎𝟎
−
=
2.0
6.0
d
=
3
1
c 𝐜𝐜
T𝐝𝐝 =?
Orthogonality?
||𝒄𝒄|| =?
||𝐝𝐝|| =?
a
b
Geometric Representation
0||||
2/1
1
2 ≥
== ∑
=
n
i
i
T aaaa
=
2
1
a
a
a
=
2
1
b
b
b
𝐛𝐛𝟏𝟏
𝐛𝐛𝟐𝟐
a
b
Inner Product Geometric Interpretation
Suppose we have a vector c that is orthogonal to b,
by the parallelogram law, 𝐚𝐚 = 𝐜𝐜 + ρ𝐛𝐛.
𝜌𝜌 here is a scala.
c
𝒃𝒃𝟏𝟏
𝒃𝒃𝟐𝟐
a
b
d
c
Vector 𝐝𝐝 = 𝜌𝜌𝒃𝒃 is called the orthogonal projection of a onto b.
∈ ℝ
Connecting to inner product
𝐚𝐚 = 𝐜𝐜 + ρ𝐛𝐛
𝐚𝐚𝑻𝑻 = 𝐜𝐜𝑻𝑻 + 𝜌𝜌𝐛𝐛𝑇𝑇
𝐚𝐚𝑻𝑻𝐛𝐛 = 𝐜𝐜𝑻𝑻𝐛𝐛 + 𝜌𝜌𝐛𝐛𝑇𝑇𝐛𝐛 = 𝟎𝟎 + 𝜌𝜌||𝒃𝒃||2 = 𝜌𝜌||𝒃𝒃||2
Inner Product Geometric Interpretation
Example
=
2
1
a
a
a
=
1
2
b
−
=
2
1
c
Vector c is orthogonal to b.
How to check orthogonality?
Suppose ρ = 0.7;
2
1
a
b
-1
2
𝐚𝐚 = 𝐜𝐜 + ρ𝐛𝐛
=
+
+−
=
7.2
4.0
1*7.02
2*7.01
a
3
1
c
Vector 𝐝𝐝 = 𝜌𝜌𝐛𝐛 is called the orthogonal projection of a onto b.
2
1
a
b
-1
2
𝐝𝐝
=
1
2
bρ = 0.7
=
7.0
4.1
d
Let’s test 𝐚𝐚𝑻𝑻𝐛𝐛 = 𝜌𝜌||𝒃𝒃||2
𝐚𝐚𝑻𝑻𝐛𝐛 = 0.4 ∗ 2 + 2.7 ∗ 1 = 3.5
𝐚𝐚𝑻𝑻𝐛𝐛 = 𝜌𝜌||𝒃𝒃||2 = 0.7 ∗ 22 + 12
2
= 3.5
||𝐝𝐝|| = 1.42 + 0.72 = 1.565
𝐚𝐚𝑻𝑻𝐛𝐛 = ||𝐝𝐝||* ||𝐛𝐛||= 1.42 + 0.72∗ 22 + 12 = 3.5
Will be used in SVM later
||𝐝𝐝|| = 𝜌𝜌||𝒃𝒃||
A matrix is a rectangular array of numbers (scalars) for which operations such
as addition and multiplication are defined. It is a rectangular (𝑁𝑁 × 𝑑𝑑) or two-
dimensional array of scalars (numbers), represented as:
Basic Calculation of Matrix
In a typical data matrix, the index 𝑖𝑖 = 1, 2, … ,𝑁𝑁 refers to the statistical units/training
examples, and the index 𝑗𝑗 = 1, 2, … ,𝑑𝑑 to the variables or features.
Can also write:
𝐗𝐗 = [𝑥𝑥𝑖𝑖𝑗𝑗]
ℝ𝑁𝑁×𝑑𝑑
𝐗𝐗 =
𝑥𝑥11 𝑥𝑥12 …
𝑥𝑥21 𝑥𝑥22 …
⋮ ⋮ ⋱
𝑥𝑥1𝑗𝑗 …
𝑥𝑥2𝑗𝑗 …
⋮ ⋮
𝑥𝑥1𝑑𝑑
𝑥𝑥2𝑑𝑑
⋮
𝑥𝑥𝑖𝑖1 𝑥𝑥𝑖𝑖2 …
⋮ ⋮ ⋮
𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱
𝑥𝑥𝑖𝑖𝑑𝑑
⋮
𝑥𝑥𝑁𝑁1 𝑥𝑥𝑁𝑁2 … 𝑥𝑥𝑁𝑁𝑗𝑗 … 𝑥𝑥𝑁𝑁𝑑𝑑
Square matrix is a matrix with the same number of row and column
numbers.
Matrix
=
44434241
34333231
22232221
14131211
aaaa
aaaa
aaaa
aaaa
a ℝ
4*4
A column vector of size 𝑁𝑁 can be represented as 𝑁𝑁 × 1 matrix
A row vector of size 𝑑𝑑 can be represented as 1 × 𝑑𝑑 matrix.
Or we can partition as below,
where each column is a 𝑁𝑁 ×
1 column vector
We can represent X as a partitioned matrix
whose generic block is the 1 × 𝑑𝑑 row
vector
Each row is a
1 × d row vector
originally 𝐱𝐱𝐢𝐢 is a
column vector
𝐗𝐗 =
𝐱𝐱1
𝑇𝑇
𝐱𝐱2
𝑇𝑇
⋮
𝐱𝐱𝑖𝑖
𝑇𝑇
⋮
𝐱𝐱𝑁𝑁
𝑇𝑇
𝐗𝐗 = 𝐱𝐱1, 𝐱𝐱2, … , 𝐱𝐱𝑗𝑗 , … , 𝐱𝐱𝑑𝑑
Matrix transpose: transposition yields the 𝑁𝑁 × 𝑑𝑑 matrix with rows
and columns interchanged
Matrix Transpose
𝐗𝐗 =
𝑥𝑥11 𝑥𝑥12 …
𝑥𝑥21 𝑥𝑥22 …
⋮ ⋮ ⋱
𝑥𝑥1𝑗𝑗 …
𝑥𝑥2𝑗𝑗 …
⋮ ⋮
𝑥𝑥1𝑑𝑑
𝑥𝑥2𝑑𝑑
⋮
𝑥𝑥𝑖𝑖1 𝑥𝑥𝑖𝑖2 …
⋮ ⋮ ⋮
𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱
𝑥𝑥𝑖𝑖𝑑𝑑
⋮
𝑥𝑥𝑁𝑁1 𝑥𝑥𝑁𝑁2 … 𝑥𝑥𝑁𝑁𝑗𝑗 … 𝑥𝑥𝑁𝑁𝑑𝑑
𝐗𝐗𝑇𝑇 =
𝑥𝑥11 𝑥𝑥21 …
𝑥𝑥12 𝑥𝑥22 …
⋮ ⋮ ⋱
𝑥𝑥𝑖𝑖1 …
𝑥𝑥𝑖𝑖2 …
⋮ ⋮
𝑥𝑥𝑁𝑁1
𝑥𝑥𝑁𝑁2
⋮
𝑥𝑥1𝑗𝑗 𝑥𝑥2𝑗𝑗 …
⋮ ⋮ ⋮
𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱
𝑥𝑥𝑁𝑁𝑗𝑗
⋮
𝑥𝑥1𝑑𝑑 𝑥𝑥2𝑑𝑑 … 𝑥𝑥𝑖𝑖𝑑𝑑 … 𝑥𝑥𝑁𝑁𝑑𝑑
Matrix Product
Let A be an 𝑵𝑵 × 𝒑𝒑matrix whose ith row is the 1 × 𝒑𝒑 vector 𝐚𝐚𝑖𝑖
𝑇𝑇,
Let B be an 𝒑𝒑 × 𝒅𝒅 matrix whose jth column is the 𝒑𝒑 × 1 vector 𝐛𝐛𝑗𝑗, so that
The matrix product C = AB, where A pre-
multiplies B, is the 𝑵𝑵 × 𝒅𝒅matrix with elements
𝐴𝐴 =
𝐚𝐚1
𝑇𝑇
𝐚𝐚2
𝑇𝑇
⋮
𝐚𝐚𝑖𝑖
𝑇𝑇
⋮
𝐚𝐚𝑁𝑁
𝑇𝑇
𝐁𝐁 = 𝐛𝐛1,𝐛𝐛2, … ,𝐛𝐛𝑗𝑗 , … ,𝐛𝐛𝑑𝑑
𝑐𝑐𝑖𝑖𝑗𝑗 = 𝐚𝐚𝑖𝑖
𝑇𝑇𝐛𝐛𝑗𝑗 = �
𝑘𝑘=1
𝑝𝑝
𝑎𝑎𝑖𝑖𝑘𝑘𝑏𝑏𝑘𝑘𝑗𝑗 , 𝑖𝑖 = 1, … ,𝑁𝑁; 𝑗𝑗 = 1, … ,𝑑𝑑
Matrix Product
How matrix product works?
=
35
21
07
13
A
−
=
625
931
B
The number of columns of A is 2 and the
number of rows of B is 2. We can do product C
= AB which has 4 rows (= A’s row number) and
3 columns (= B’s column number):
=
434241
333231
232221
131211
ccc
ccc
ccc
ccc
C
851131111 =×+×== ba
Tc
C11 is the inner product of first row
of A and first column of B
C32 =?
=
1
3
1a
=
0
7
b
Properties of Matrix
Not any two matrices have a product. You must make sure that the
number of columns of the first matrix is EQUAL to the number of rows of
the second matrix. Hence in general Matrix product is not commutative:
AB is unequal to BA;
So in previous example: BA is not defined.
We do have AB=BA for some appropriate matrices A and B.
If A is an 𝒎𝒎 × 𝒑𝒑matrix, notice the difference between 𝐀𝐀𝐓𝐓𝐀𝐀 (𝒑𝒑 × 𝒑𝒑matrix
of crossproducts) and 𝐀𝐀𝐀𝐀𝐓𝐓 (size 𝒎𝒎 × 𝒎𝒎).
Matrix Special Cases
A square matrix has row number equals to the column number: 𝑵𝑵 = 𝒅𝒅
A square matrix A is symmetric if AT = A
Diagonal matrix: a square matrix with all zeros on the nondiagonal positions
Can you have Diagonal
matrix if 𝑁𝑁 is unequal to 𝑑𝑑?
Identity matrix (𝐈𝐈𝑵𝑵) of order 𝑁𝑁 is a diagonal matrix with all 𝑑𝑑𝑖𝑖 = 1
𝐷𝐷 =
𝑑𝑑1 0 …
0 𝑑𝑑2 ⋱
⋮ ⋱ ⋱
0
0
⋱
0
0
⋮
0 ⋱ ⋱ 𝑑𝑑𝑁𝑁−1 0
0 0 … 0 𝑑𝑑𝑁𝑁
= diag(𝑑𝑑1,𝑑𝑑2, … ,𝑑𝑑N−1,𝑑𝑑𝑁𝑁)
Properties of Matrix
If A is 𝑁𝑁 × 𝑑𝑑, then
Scalar matrix:
Quadratic form: Let A be an d dimensional symmetric square
matrix and x be an 𝑑𝑑 × 1 vector. Below scalar is called a quadratic
form.
Semi-positive (nonnegative) definite
Positive definite Examples?
Outer product: If x is an 𝑁𝑁 × 1 vector and y is an 𝑑𝑑 × 1 vector, the
outer product xyT is an 𝑁𝑁 × 𝑑𝑑 matrix.
𝐈𝐈𝑁𝑁𝐀𝐀 = 𝐀𝐀 and 𝐀𝐀 𝐈𝐈𝑑𝑑 = 𝐀𝐀
𝐱𝐱𝑇𝑇𝐀𝐀𝐱𝐱
𝜌𝜌𝐈𝐈𝑑𝑑
𝐱𝐱𝑇𝑇𝐀𝐀𝐱𝐱 ≥ 0
𝐱𝐱𝑇𝑇𝐀𝐀𝐱𝐱 > 0
�𝐗𝐗 =
1
𝑁𝑁
𝐗𝐗𝑇𝑇𝐢𝐢𝑁𝑁 =
�̅�𝑥1
�̅�𝑥2
⋮
�̅�𝑥𝑗𝑗
⋮
�̅�𝑥𝑑𝑑
Matrix Mean Vector
Let 𝑿𝑿 ∈ ℝ𝑁𝑁×𝑑𝑑 be a matrix of size 𝑁𝑁 × 𝑑𝑑 and 𝒊𝒊𝑁𝑁 = [1, 1, … , 1]𝑇𝑇 be 𝑁𝑁 ×
1 vector of all ones.
𝑑𝑑 × 1
1 × 𝑑𝑑
𝑑𝑑 × 𝑁𝑁
𝑁𝑁 × 1
Mean of feature 1
Mean of feature 2
d features, each has
N training examples
𝐗𝐗 =
𝑥𝑥11 𝑥𝑥12 …
𝑥𝑥21 𝑥𝑥22 …
⋮ ⋮ ⋱
𝑥𝑥1𝑗𝑗 …
𝑥𝑥2𝑗𝑗 …
⋮ ⋮
𝑥𝑥1𝑑𝑑
𝑥𝑥2𝑑𝑑
⋮
𝑥𝑥𝑖𝑖1 𝑥𝑥𝑖𝑖2 …
⋮ ⋮ ⋮
𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱
𝑥𝑥𝑖𝑖𝑑𝑑
⋮
𝑥𝑥𝑁𝑁1 𝑥𝑥𝑁𝑁2 … 𝑥𝑥𝑁𝑁𝑗𝑗 … 𝑥𝑥𝑁𝑁𝑑𝑑
𝐗𝐗𝑇𝑇 =
𝑥𝑥11 𝑥𝑥21 …
𝑥𝑥12 𝑥𝑥22 …
⋮ ⋮ ⋱
𝑥𝑥𝑖𝑖1 …
𝑥𝑥𝑖𝑖2 …
⋮ ⋮
𝑥𝑥𝑁𝑁1
𝑥𝑥𝑁𝑁2
⋮
𝑥𝑥1𝑗𝑗 𝑥𝑥2𝑗𝑗 …
⋮ ⋮ ⋮
𝑥𝑥𝑖𝑖𝑗𝑗 …
⋮ ⋱
𝑥𝑥𝑁𝑁𝑗𝑗
⋮
𝑥𝑥1𝑑𝑑 𝑥𝑥2𝑑𝑑 … 𝑥𝑥𝑖𝑖𝑑𝑑 … 𝑥𝑥𝑁𝑁𝑑𝑑
�𝐗𝐗𝑇𝑇 =
1
𝑁𝑁
𝐢𝐢𝑁𝑁
𝑇𝑇𝐗𝐗 = [�̅�𝑥1, �̅�𝑥2, … . �̅�𝑥𝑗𝑗 , … , �̅�𝑥𝑑𝑑]
𝑠𝑠𝑗𝑗
2 =
1
𝑁𝑁 − 1
�
𝑖𝑖=1
𝑁𝑁
𝑥𝑥𝑖𝑖𝑗𝑗 − �̅�𝑥𝑗𝑗
2
, 𝑠𝑠𝑗𝑗𝑘𝑘 =
1
𝑁𝑁 − 1
�
𝑖𝑖=1
𝑁𝑁
𝑥𝑥𝑖𝑖𝑗𝑗 − �̅�𝑥𝑗𝑗 𝑥𝑥𝑖𝑖𝑘𝑘 − �̅�𝑥𝑘𝑘
Sample Variance-Covariance Matrix
Is S symmetric?
Suppose X is a matrix of size 𝑁𝑁 × 𝑑𝑑: 𝑁𝑁 training examples and d features
Feature 𝑗𝑗 variance Sample covariance between feature 𝑗𝑗 and 𝑘𝑘
𝑆𝑆12 =?
Mean of feature j
𝐒𝐒 =
𝑠𝑠1
2 𝑠𝑠12 …
𝑠𝑠21 𝑠𝑠2
2 …
⋮ ⋮ ⋱
𝑠𝑠1𝑗𝑗 …
𝑠𝑠2𝑗𝑗 …
⋮ ⋮
𝑠𝑠1𝑑𝑑
𝑠𝑠2𝑑𝑑
⋮
𝑠𝑠𝑗𝑗1 𝑠𝑠𝑗𝑗2 …
⋮ ⋮ ⋮
𝑠𝑠𝑗𝑗
2 …
⋮ ⋱
𝑠𝑠𝑗𝑗𝑑𝑑
⋮
𝑠𝑠𝑑𝑑1 𝑠𝑠𝑑𝑑2 … 𝑠𝑠𝑑𝑑𝑗𝑗 … 𝑠𝑠𝑑𝑑
2
Matrix Rank
In linear algebra the rank of a matrix A is the dimension of the vector space
generated (or spanned) by its columns. This is the same as the dimension of
the space spanned by its rows.
The column and row rank are coincident and so we can define the rank of
the matrix as the maximum number of linearly independent vectors
(those forming either the rows or the columns) and denote it by r(A).
Obviously r(A) ≤ min(N, d).
=
963
852
741
A
−−→+−→
−−
−−→+−→
−−
→+−→
000
630
741
2
1260
630
741
2
1260
852
741
3
963
852
741
322131 rrrrrr
Determinant
If A is 𝑵𝑵 × 𝑵𝑵, its determinant, det(A) or |A|, is a scalar, whose absolute value
measures the volume of the parallelogram delimited in d-dimensional space by
the columns of A.
For the identity matrix
For the diagonal matrix
Moreover, if ρ is a scalar
• If the columns (rows) of A are linearly dependent, so that rank(A) < N, then |A| = 0; • |AB| = |A| · |B|; • |AT | = |A|. |𝐈𝐈𝑁𝑁| = 1 𝐃𝐃 = 𝑑𝑑1𝑑𝑑2 …𝑑𝑑𝑁𝑁 = � 𝑛𝑛=1 𝑁𝑁 𝑑𝑑𝑛𝑛 𝜌𝜌𝐃𝐃 = (𝜌𝜌𝑑𝑑1)(𝜌𝜌𝑑𝑑2) … (𝜌𝜌𝑑𝑑𝑁𝑁) = 𝜌𝜌𝑁𝑁 𝐃𝐃 Matrix Determinant The general expression for the determinant is the following Laplace (cofactor) expansion − = 25 71 A ( ) ( ) 37352)5()1(721 |5|17|2|11|| 2111 =+=−×−×+×= −×−×+×−×= ++A where 𝐀𝐀ij is the submatrix obtained from 𝐀𝐀 by removing the ith row and the jth column; |𝐀𝐀ij| is called a minor of 𝐀𝐀 and (−1) 𝑖𝑖+𝑗𝑗|𝐀𝐀ij| is called cofactor. 𝐀𝐀 = � 𝑗𝑗=1 𝑁𝑁 𝑎𝑎𝑖𝑖𝑗𝑗 −1 𝑖𝑖+𝑗𝑗 𝐀𝐀𝑖𝑖𝑗𝑗 , for any fixed 𝑖𝑖 = 1, 2, … ,𝑁𝑁 = 333231 232221 131211 aaa aaa aaa a Matrix (3×3) Determinant 3231 2221 13 3331 2321 12 3332 2322 11 333231 232221 131211 || aa aa a aa aa a aa aa a aaa aaa aaa +−==a The determinant is only defined for square matrices. For non-square matrices, there’s no determinant value. Matrix Trace The trace of a square matrix is the sum of its diagonal elements. If A is 𝑁𝑁 × 𝑁𝑁, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )BAtrABtr AtrAtr BtrAtrBAtr AtrAtr = = +=+ = T ρρ tr 𝐀𝐀 ∶= � 𝑖𝑖=1 𝑁𝑁 𝑎𝑎𝑖𝑖𝑖𝑖 Linear Equations Systems Consider the system of 𝑛𝑛 linear equations in 𝑛𝑛 unknown, where A is a known 𝒏𝒏 × 𝒏𝒏 coefficients matrix and b a known 𝒏𝒏 × 𝟏𝟏 vector: bAx = A non homogeneous system admits a unique solution if and only if |A| is unequal to 0, or equivalently rank(A) = n. In such case, the solution can be written as bAx 1−= 3 unknowns; 3 equations Matrix Inverse Let A be a square matrix of dimension 𝑛𝑛 with full rank: rank(A) = 𝑛𝑛. The inverse matrix is the matrix X which when pre-multiplied or post-multiplied by A returns the identity matrix nn IAXIXA == , If such X exists, then it is unique. We can write 𝐗𝐗 = 𝐀𝐀−𝟏𝟏, called the inverse of A. For a diagonal matrix, the computation of the inverse is immediate 11, −− == AXXA 𝐷𝐷−1 = 1/𝑑𝑑1 0 … 0 1/𝑑𝑑2 ⋯ 0 0 ⋱ ⋯ 0 0 … 0 0 ⋯ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋯ 0 0 ⋯ ⋱ ⋮ ⋮ 0 1/𝑑𝑑𝑛𝑛−1 0 ⋯ 0 1/𝑑𝑑𝑛𝑛 Matrix Inverse Example We now illustrate the 2 × 2 case. From the definition of an inverse, 𝐀𝐀𝐗𝐗 = 𝐈𝐈𝟐𝟐, it follows = 10 01 2221 1211 2221 1211 xx xx aa aa This yields a system of 4 equations in 4 unknowns: 1 0 0 1 22221221 21221121 22121211 21121111 =+ =+ =+ =+ xaxa xaxa xaxa xaxa − − − === − 1121 1222 21122211 *1 1 || 1 aa aa aaaa A A AX Recall matrix determinant If |A|=0 or rank(A) < n, then its inverse does not exist. where 𝐀𝐀∗ is known as the adjoint matrix of 𝐀𝐀, with elements given by the cofactors of 𝐀𝐀, e.g., ( ) ||1* ij ji jia A +−= Matrix Inverse Slide Number 1 Slide Number 2 Slide Number 3 Slide Number 4 Slide Number 5 Slide Number 6 Slide Number 7 Slide Number 8 Slide Number 9 Slide Number 10 Slide Number 11 Slide Number 12 Slide Number 13 Slide Number 14 Slide Number 15 Slide Number 16 Slide Number 17 Slide Number 18 Slide Number 19 Slide Number 20 Slide Number 21 Slide Number 22 Slide Number 23 Slide Number 24 Slide Number 25 Slide Number 26 Slide Number 27 Slide Number 28 Slide Number 29 Slide Number 30 Slide Number 31 Slide Number 32 Slide Number 33 Slide Number 34 Slide Number 35 Slide Number 36 Slide Number 37 Slide Number 38 Slide Number 39 Slide Number 40 Slide Number 41 Slide Number 42 Slide Number 43 Slide Number 44 Slide Number 45 Slide Number 46 Slide Number 47 Slide Number 48 Slide Number 49 Slide Number 50 Slide Number 51 Slide Number 52 Slide Number 53