CS代考 APS1070

APS1070
Foundations of Data Analytics and Machine Learning
Fall 2021
Week 5:
• LinearAlgebra
• Analytical Geometry
• Data Augmentation
Prof.

Mid-term assessment logistics
➢Platform: Crowdmark
➢ is available now (emailed to you) to get familiar with the
logistics. It will not be graded. Due date 8 Oct at 21:00.
➢Some practice problems and solutions from past semesters are on Quercus
• Material in midterm:
• Week 1 to 5 of Lectures, Tutorials 1 and 2, Project 1, Reading assignments 1-4.
2

Mid-term assessment logistics
• Midterm Assessment Distribution: Oct 18th at 9:00 (Toronto time)
• Deadline for Submitting the Assessment: Oct 19th at 21:00 (Toronto time)
• A countdown starts as soon as you access the assessment (2 hours as per course schedule)
• 30 minutes are already added to the countdown time for contingency. No excuses.
• Time needed for writing answers: about 90 minutes
• Late submission or no submission: 0 mark (as per syllabus).
• Use course material + online resources – NO HELP FROM OTHERS!! No Piazza.
• Quercus announcement for common issues and typos,
• If needed, make assumptions and answer the questions. Do not contact us to ask.
• In case of a logistic problem, you can email Samin and Ali during your assessment. There
is no guarantee that we can respond to you before your time runs out.
3

Slide Attribution
These slides contain materials from various sources. Special thanks to the following authors:
• •

Last Time
➢Looked into assessing model performance ➢Probability Theory
➢ Gaussians ➢Confusion Matrix ➢ ROCs
➢Data Processing to the Rescue ➢Data Augmentation (today) ➢Dimensionality Reduction (Week 7)
5

What do these datasets have in common?
➢ How can we improve these datasets?
6

ML Performance Benchmarks
➢C10+ and C100+ highlight the error rates after data augmentation
➢Data augmentation found to consistently lower the error rates!
7

Agenda
➢Linear Algebra
➢Scalars, Vectors, Matrices
➢Solving Systems of Linear Equations ➢Linear Independence
➢Linear Mappings
➢Analytic Geometry
➢Norms, Inner Products, Lengths, etc. ➢Angles and Orthonormal Basis
➢Data Augmentation
Today’s Theme:
Data Processing
8

Part 1 Linear Algebra
Readings:
• Chapter 2.1-5 MML Textbook

Systems of Linear Equations
➢ The solution space of a system of two linear equations with two variables can be geometrically interpreted as the intersection of two lines
➢ intersection of planes in three variables x2
4×1+ 4×2=5
2×1- 4×2=1
x1
System in three variables – solution is at intersection
System with 2 equations and three variables – solution is typically a line
10

Matrix Representation
➢Used to solve systems of linear equations more systematically ➢Compact notation collects coefficients into vectors, and vectors
into matrices:
𝑎11 𝑎12 𝑎1𝑛 𝑏 ⋮⋮⋮1
𝑥1 +𝑥2 +…+𝑥𝑛 𝑎𝑚1 𝑎𝑚2
= ⋮ 𝑎𝑚𝑛 𝑏𝑚
𝑎11 ⋯ 𝑎1𝑛 𝑥1 𝑏
⋮⋮⋮1 ⟺=⋮
𝑎𝑚1 ⋯ 𝑎𝑚𝑛 𝑥𝑛
𝑏𝑚
11

Matrix Notation
➢A matrix has m x n elements (with 𝑚,𝑛 ∈ N, and aij, i=1,…,m; j=1,…,n) which are ordered according to a rectangular scheme consisting of m rows and n columns:
12n
𝑎11 𝑎12 … 𝑎1𝑛 1 𝐴=𝑎21 𝑎22 …𝑎2𝑛2𝑎𝑖𝑗∈R
⋮⋮⋮
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛 m
➢By convention (1 by n)-matrices are called rows and (m by 1)-matrices are called columns. These special matrices are also called row/column vectors.
➢A (1 by 1)-matrices is referred to as scalars
12

Addition and Scalar Multiplication
➢Vector addition:
𝑎+𝑏= 𝑎1 + 𝑏1 = 𝑎1+𝑏1 𝑎2 𝑏2 𝑎2 + 𝑏2
➢Scalar multiplication:
⍺𝑏=⍺𝑏1 =⍺𝑏1 𝑏2 ⍺𝑏2
13

Addition and Scalar Multiplication
➢Matrix addition: The sum of two matrices 𝐴 ∈ R𝑚×𝑛, 𝐵 ∈ R𝑚×𝑛 is defined as the element-wise sum:
𝑎+𝑏 …𝑎+𝑏 11 11 1𝑛 1𝑛
𝐴+𝐵=⋮⋮ 𝑎𝑚1+𝑏𝑚1 … 𝑎𝑚𝑛+𝑏𝑚𝑛
➢Scalar multiplication of a matrix 𝐴 ∈ R𝑚×𝑛is defined as:
𝛼∗𝑎11 … 𝛼∗𝑎1𝑛 ⋮⋮ 𝛼∗𝑎𝑚1 … 𝛼∗𝑎𝑚𝑛
𝛼∗𝐴=
14

Matrix Multiplication
➢ We can multiply a matrix by a column vector:
𝑎11 𝑎12 𝑎13 𝑥1 𝑎11𝑥1 + 𝑎12𝑥2 + 𝑎13𝑥3 𝐴𝑥 = 𝑎21 𝑎22 𝑎23 𝑥2 = 𝑎21𝑥1 +𝑎22𝑥2 +𝑎23𝑥3 𝑎31 𝑎32 𝑎33 𝑥3 𝑎31𝑥1 + 𝑎32𝑥2 + 𝑎33𝑥3
➢ We can multiply a matrix by a row vector:
𝑎11 𝑎12 𝑎13
𝑥𝑇𝐴 = 𝑥1 𝑥2 𝑥3 𝑎21 𝑎22
𝑎31 𝑎32 𝑎33
𝑎23 = 𝑎11𝑥1 + 𝑎21𝑥2 + 𝑎31𝑥3 𝑎12𝑥1 + 𝑎22𝑥2 + 𝑎32𝑥3 𝑎13𝑥1 + 𝑎23𝑥2 + 𝑎33𝑥3
➢ In general, we can multiply matrices A and B when the number of columns in A matches the number of rows in B:
𝑎11 𝑎12 𝑎13
𝑏11 𝑏12 𝑏13
𝑎11𝑏11 + 𝑎12𝑏21 + 𝑎13𝑏31 𝑎21𝑏11 + 𝑎22𝑏21 + 𝑎23𝑏31 𝑎31𝑏11 + 𝑎32𝑏21 + 𝑎33𝑏31
𝑎11𝑏12 + 𝑎12𝑏22 + 𝑎13𝑏32 𝑎21𝑏12 + 𝑎22𝑏22 + 𝑎23𝑏32 𝑎31𝑏12 + 𝑎32𝑏22 + 𝑎33𝑏32
𝑎11𝑏13 + 𝑎12𝑏23 + 𝑎13𝑏33 𝑎21𝑏13 + 𝑎22𝑏23 + 𝑎23𝑏33 𝑎31𝑏13 + 𝑎32𝑏23 + 𝑎33𝑏33
𝐴𝐵 = 𝑎21 𝑎22
𝑎31 𝑎32 𝑎33
𝑎23
𝑏21
𝑏31 𝑏32
=
𝑏22 𝑏23 𝑏33
15

Example: Matrix Multiplication
1232×3 023×2 ➢ For two matrices: 𝐴 = 3 2 1 ∈ R , 𝐵 = 1 −1 ∈ R ,
➢ we obtain:
01
12302 23 2×2 𝐴𝐵=3211−1=25∈R ,
01
02123 642 3×3 𝐵𝐴= 1 −1 3 2 1 = −2 0 2 ∈R
01 321 Not commutative! 𝐴𝐵 ≠ 𝐵𝐴
16

Basic Properties
➢ A few properties: ➢ Associativity:
∀𝐴 ∈ R𝑚×𝑛,𝐵 ∈ R𝑛×𝑝,𝐶 ∈ R𝑝×𝑞: 𝐴𝐵 𝐶 = 𝐴(𝐵𝐶)
➢ Distributivity:
∀𝐴,𝐵∈R𝑚×𝑛,𝐶,𝐷∈R𝑛×𝑝: 𝐴+𝐵 𝐶=𝐴𝐶+𝐵𝐶
𝐴𝐶+𝐷 =𝐴𝐶+𝐴𝐷
17

Inner Product and Outer Product
➢The inner product between vectors of the same length is: 𝑛
𝑎𝑇𝑏=෍𝑎𝑖𝑏𝑖 =𝑎1𝑏1+𝑎2𝑏2+ …+𝑎𝑛𝑏𝑛=𝛾 𝑖=1
➢The outer product between vectors of the same length is: 𝑎𝑏 𝑎𝑏 …𝑎𝑏
Theinner product is a scalar
11121𝑛
𝑎𝑏 𝑎𝑏 …𝑎𝑏 𝑎𝑏𝑇=2122 2𝑛
⋮⋮⋮
𝑎𝑏 𝑎𝑏 …𝑎𝑏 𝑛1𝑛2𝑛𝑛
The outer product is a matrix
18

Identity Matrix
➢ We define the identity matrix as shown:
10⋯0⋯0
01⋯0⋯0 𝐼:=⋮ ⋮ ⋱ ⋮ ⋱ ⋮
∈R𝑛×𝑛
𝑛
00⋯0⋯1
00⋯1⋯0 ⋮⋮⋱⋮⋱⋮
➢ Any matrix multiplied by the identity will not change the matrix: 1 2 3 1 0 0 1+0+0 0+2+0 0+0+3
4 5 6 X 0 1 0 = 4+0+0 0+5+0 0+0+6 7 8 9 0 0 1 7+0+0 0+8+0 0+0+9
19

Inverse
➢If square matrices 𝐴 ∈ R𝑛×𝑛 and 𝐵 ∈ R𝑛×𝑛 have the property that 𝐴𝐵 = 𝐼 = 𝐵𝐴. Then B is called the inverse of A and denoted by A-1.
➢Example, these matrices are inverse to each other:
1 2 1 −7 −7 6 𝐴=445, 𝐵=2 1−1 6 7 7 4 5 −4
➢ We’ll look at how to calculate the inverse later
𝑛
20

Transpose
➢Transpose definition: For 𝐴 ∈ R𝑚×𝑛 the matrix 𝐵 ∈ R𝑛×𝑚 with 𝑏𝑖𝑗 = 𝑎 is called transpose of A. We write 𝐵 = 𝐴𝑇.
➢Some useful identities:
𝑗𝑖
➢Symmetric Matrix: A matrix 𝐴 ∈ R𝑛×𝑛 is symmetric if 𝐴 = 𝐴𝑇.
𝐴𝐴−1 = 𝐼 = 𝐴−1𝐴 (𝐴𝐵)−1= 𝐵−1𝐴−1 (𝐴 + 𝐵)−1≠ 𝐴−1 + 𝐵−1 (𝐴𝑇)𝑇= 𝐴
(𝐴 + 𝐵)𝑇= 𝐴𝑇 + 𝐵𝑇 (𝐴𝐵)𝑇= 𝐵𝑇𝐴𝑇
21

Solving Systems of Linear Equations
➢Given A and b, we want to solve for x:
211𝑥1 5 𝐴𝑥=𝑏 4 −60𝑥2=−2
−272𝑥3 9
➢ Key to solving a system of linear equations are elementary transformations that keep the solution set the same, but that transform the equation system into a simpler form.
1. Exchange of two equations (rows in the matrix)
2. Multiplication of an equation (row) with a constant
3. Addition of two equations (rows)
➢This is known as Gaussian Elimination (aka row reduction)
22

Triangular Linear Systems
➢ Consider a square linear system with an upper triangular matrix (non- zero diagonals):
𝑎11 𝑎12 𝑎13 𝑥1
0 𝑎22 𝑎23 𝑥2 𝑏2 0 0 𝑎33 𝑥3 𝑏3
➢ We can solve this system bottom to top using substitution:
𝑎33𝑥3 = 𝑏3
𝑎22𝑥2 + 𝑎23𝑥3 = 𝑏2
𝑎 𝑥 +𝑎 𝑥 +𝑎 𝑥 =𝑏 11 1 12 2 13 3 1
𝑥3 = 𝑏3 𝑎33
𝑥2 = 𝑏2−𝑎23𝑥3 𝑎22
𝑥 =𝑏1−𝑎13𝑥3−𝑎12𝑥2
𝑏 1
1
𝑎11
23

Example: Gaussian Elimination
➢Gaussian elimination uses elementary row operations to transform a linear system into a triangular system:
➢ Add -2 times first row to second ➢ Add 1 times first row to third
➢ Add 1 times second row to third
2 1
0 −8
0 8 3 14
2𝑥1+𝑥2+𝑥3=5 4𝑥1−6𝑥2=−2 −2𝑥1+7𝑥2+2𝑥3=9
2𝑥1 + 𝑥2 + 𝑥3 = 5 −8𝑥2 − 2𝑥3 = −12 8𝑥2 + 3𝑥3 = 14
2𝑥1 + 𝑥2 + 𝑥3 = 5 −8𝑥2 −2𝑥3 =−12
2 1 15
4 −6 0อ−2 −2 7 2 9
0 −8
𝑥3=2 0012
2 1 1 5
Row Echelon form
24
1
−2อ −12
−2อ−12
5

Row Echelon Form (REF)
• The first non-zero coefficient from the left (the “leading coefficient”) is always to the right of the first non-zero coefficient in the row above.
• Rows consisting of all zero coefficients are at the bottom of the matrix.
2115 0 −8 −2อ −12 0012
Row Echelon form
25

Example: Reduced Echelon Form
➢We can simplify this even further:
➢ Divide first row by 2 ➢ Divide 2nd row by -8
2𝑥1 + 𝑥2 + 𝑥3 = 5 −8𝑥2 − 2𝑥3 = −12 𝑥3 = 2
𝑥1+0.5𝑥2+0.5𝑥3=2.5 𝑥2 +0.25𝑥3 =1.5
𝑥3 = 2
2115 0 −8 −2อ −12 0012
1 0.5 0.5 2.5 0 1 0.25อ1.5 0012
➢ Add -0.25 times third row to second row ➢ Add -0.5 times third row to first row
➢ Add -0.5 times second row to first row
𝑥1=1 𝑥2=1 𝑥3=2
1 0 01 0 1 0อ1 0 0 12
Reduced row Echelon form
26

The coefficient matrix could be non-square Example 2.6 from MML book (reading assignment 4):
Four equations and five unknowns

Alternative Method: Inverse Matrix
➢We can also solve linear systems of equations is by applying the inverse.
➢The solution to 𝐴𝑥 = 𝑏 can be obtained by multiplying by 𝐴−1 to isolate for x.
𝐴𝑥 = 𝑏 𝐴−1𝐴𝑥 = 𝐴−1𝑏
−1 𝑥=𝐴 𝑏
Note that 𝐴−1will cancel out 𝐴 only if multiplied from the left-hand side, otherwise we have 𝐴−1𝑥𝐴
𝐼 𝑥 = 𝐴 𝑏
𝑛
−1
33

Calculating an Inverse Matrix
➢ To determine the inverse of a matrix A ➢ Write down the augmented matrix with
the identity on the right-hand side
➢ Apply Gaussian elimination to bring it into reduced row-echelon form. The desired inverse is given as its right-hand side:
➢ We can verify that this is indeed the inverse by
performing the multiplication 𝐴𝐴−1 and
observing that we recover 𝐼 . 𝑛
𝐴=
1020 1100 1201 1111
10201000 1 1 0 0ተ0 1 0 0 12010010 11110001
1 0 0 0 −1 2 −2 2 0 1 0 0ተ 1 −1 2 −2 00101 −1 1 −1 0 0 0 1 −1 0 −1 2
−1 2 −2 2 1 −1 2 −2 1 −1 1 −1
−1 0 −1 2
𝐴−1 =
34

What can go wrong?
➢ Applying Gaussian Elimination (row reduction) does not always lead to a solution.
➢ Singular Case: When we have a 0 in a pivot column. This is an example of a matrix that is not invertible.
➢ For example:
2111 1005 0 0 3อ−2 0 1 0อ4 0042 0000
singular
singular
35

What can go wrong?
➢ Applying Gaussian Elimination (row reduction) does not always lead to a solution.
➢ Singular Case: When we have a 0 in a pivot column. This is an example of a matrix that is not invertible.
➢ For example:
2111 1005 0 0 3อ−2 0 1 0อ4 0042 0000
singular
singular
➢ To understand this better it helps to consider matrices from a geometric perspective.
36

Several Interpretations
➢Given A and b, we want to solve for x:
𝐴𝑥=𝑏 2−1𝑥=1
➢This can be given several interpretations: ➢By rows: x is the intersection of hyper-planes:
2𝑥 − 𝑦 = 1 𝑥+𝑦=5
➢By columns: x is the linear combination that gives b: 𝑥 2 +𝑦 −1 = 1
115
➢Transformation: x is the vector transformed to b: 𝑇𝑥=𝑏
37
11𝑦5

Geometry of Linear Equations
➢By Rows:
Find intersection of
hyperplanes
Hyperplane from row 1
Solution(x) x=[1.7,2.2]T
Hyperplane from row 2
➢By Columns:
Find linear combination of
columns
𝑥 2 +𝑦 −1 = 1 115
b=1.7*Column1+ 2.2*Column2
(so x=[1.7,2.2]T
Column 1
Column 2
38

What can go wrong?
➢By rows:
Hyperplane from row 2
Hyperplane from row 1
Hyperplane from row 1
Hyperplane from row 2
No intersection Infinite intersection
39

One unique solution
$1000 USD
$10 CAD
Want to buy a phone: $1000USD: phone +$10 CAD shipping
$1 USD
1000x 10x
$1 CAD
40

No solution
Want to buy a phone: $1000USD: phone + 10 Euro shipping
$1000USD €10EURO
$1 USD $1 CAD 1000x 10x
41

No solution
➢By columns:
Vector not in column
space (no solution)
Column-space
Columns of matrix
42

Infinite solution
$1000 USD
$10 CAD
Want to buy a phone: $1000USD: phone +$10 CAD shipping
USD: 1000 CAD: 10 BTC: 0
USD: 999 CAD: 10 BTC: 0.0001
USD: 999 CAD: 9
BTC: 0.00019
$1 USD
$1 CAD
USD: 998.2 CAD: 8.6
BTC: 0.000216
BTC
43

Independent Columns
➢By columns:
Column-space
Columns of matrix
Vector in column space (infinite solutions)
44

Solutions to Ax=b
➢Q: In general, when does Ax=b have a unique solution?
➢A: When b is in the column- space of A, and the columns of A are linearly independent
➢Q: What does it mean to be independent?
Vector not in column space (no solution)
Column-space
Columns of matrix
Vector in column space (infinite solutions)
45

Linear Dependence
➢A set of vectors is either linearly dependent or linearly independent.
➢A vector is linearly dependent on a set of vectors if it can be written as a linear combination of them:
𝑐=𝛼𝑏+𝛼𝑏+…𝛼𝑏 1122𝑛𝑛
➢We say that c is “linearly dependent” on {b1, b2, …, b3}, and that the set {c,b1, b2, …, b3} is “linearly dependent”
➢A set is linearly dependent iff the zero vector can be written as a combination of the vectors {b1, b2, …, b3}:
∃ ≠0,𝑠.𝑡.0=𝛼 𝑏 +𝛼 𝑏 + …𝛼 𝑏 => 𝑏 ,𝑏 ,…,𝑏 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝛼 1122𝑛𝑛12𝑛
46

Linear Independence
➢If a set of vectors is not linearly dependent, we say it is linearly independent
➢The zero vector cannot be written as a combination of independent vectors unless all coefficients 𝛼 are set to zero:
0 = 𝛼 𝑏 + 𝛼 𝑏 + … 𝛼 𝑏 => 𝛼 = 0 ∀ 1122𝑛𝑛𝑖𝑖
➢If the vectors are independent, then there is no way represent one of the vectors as a combination of the others.
47

Linear Dependence vs Independence
➢Independence in R2:
Dependent
Independent
Independent Dependent
48

Linear Independence
➢ Consider we have a set of three vectors {𝑥1, 𝑥2 , 𝑥3} 𝜖 R4 ➢ To check whether they are linearly dependent, we
solve: 𝜆1𝑥1 + 𝜆2𝑥2 + 𝜆2𝑥3 = 0
➢We write the vectors 𝑥𝑖,𝑖 = 1,2,3, as the columns of a matrix and apply elementary row operations until we identify the pivot columns.
➢ All column vectors are linearly independent if and only if all columns are pivot columns. If there is at least one non-pivot column the vectors are linearly dependent.
1 1 −1
𝑥 = 2 ,𝑥 = 1 ,𝑥 = −2 1 −3 2 0 3 1
421 𝜆1𝑥1 +𝜆2𝑥2 +𝜆2𝑥3 =0
1 1 −1
2 1 −2 −3 0 1 421

1 1 −1 010 001 000
49

Vector Space
➢A vector space is a set of objects called “vectors”, with closed operations “addition” and “scalar multiplication” satisfying certain axioms:
1. 𝑥+𝑦=𝑦+𝑥
2. 𝑥+𝑦+𝑧=𝑥+𝑦+𝑧
3. exists a zero vector “0” s.t. ∀𝑥,𝑥 + 0 = 𝑥
4. ∀𝑥,𝑒𝑥𝑖𝑠𝑡𝑠𝑎𝑛𝑎𝑑𝑑𝑖𝑡𝑖𝑣𝑒𝑖𝑛𝑣𝑒𝑟𝑠𝑒“−𝑥”,𝑠.𝑡. 𝑥+ −𝑥 =0
5. 1𝑥=𝑥
6. 𝑐1𝑐2 𝑥 = 𝑐1(𝑐2𝑥)
7. 𝑐𝑥+𝑦=𝑐𝑥+𝑐𝑦
8. 𝑐1+𝑐2 𝑥=𝑐1𝑥+𝑐2𝑥
➢ Examples: R, R2, R𝑛
50

Subspace
➢Subspaces generated in R2:
span({x1,x2})=line
span({x1,x2})=R2
set of vectors 𝒜 = {𝑥1,…,𝑥𝑘} ⊆ 𝒱
The set of all linear combinations of vectors in 𝒜 is called the span of 𝒜. If 𝒜 spans the vector space V, write 𝑉 = 𝑠𝑝𝑎𝑛[𝒜] or
𝑉 = 𝑠𝑝𝑎𝑛[𝑥1,…,𝑥𝑘]
span({x1,x2})=R2
span({x1,x2,x3})=R2
51

Basis
➢The vectors that span a subspace are not unique
➢However, the minimum number of vectors needed to span a subspace is
unique
➢This number is called the dimension or rank of the subspace
➢A minimal set of vectors that span a subspace is called a basis for the space
➢The vectors in a basis must be linearly independent, otherwise we could remove one and still span space
52

Basis
➢Basis in vector space V ∈ R2:
Not a
basis
Basis
Every linearly independent set of vectors that span V is called a basis of V
Basis
Not a
basis
53

Example Bases
➢ In R3 , the canonical/standard basis is: 100
B=൝0,1,0ൡ 001
➢ Two different bases of R3are:
1 1 1 0.5 1.8 −2.2
B =൝0,1,1ൡ B=൝0.8,0.3,−1.3ൡ 12
0 0 1 0.4 0.3 3.5
54

Linear Mapping/Transformation
➢Earlier, we saw that vectors are objects that can be added together and multiplied by a scalar, and the resulting object is still a vector
➢Now, we do the same for vector spaces
➢ Linear Mapping: For vector spaces V, W, a mapping 𝜙: 𝑉 → 𝑊 is called a linear
mapping (or linear transformation) if:
∀𝑥,𝑦∈𝑉 ∀𝜆,𝜓∈R: Φ 𝜆𝑥+𝜓𝑦 =𝜆Φ 𝑥 +𝜓Φ(𝑦)
➢It turns out that we can represent linear mappings as matrices. Recall that we can also collect a set of vectors as columns of a matrix. When working with matrices, we have to keep in mind what the matrix represents: a linear mapping or a collection of vectors.
55

Linear Mapping/Transformation
➢A vector has different coordinate representations depending on which coordinate system or basis is chosen.
➢ Example: two different coordinate systems defined by two sets of basis vectors.
𝑥
𝑥 = − 1 𝑏1 + 5 𝑏2 22
two different bases
56
𝑒2
𝑏2
𝑒1 𝑏1
𝑥 = 2𝑒1 + 3𝑒2

Example: Change of Basis Matrix
v2
f u1
U = [2 3]T[4 5]T [f]v=[2 4]T [f]u=?
u2
v1
Source:

Examples of Transforms
➢Scale
➢Rotation
➢Horizontal Mirror
➢Vertical Mirror
➢Combination of Transformations
Source: mathisfun.com
58

Part 2 Analytical Geometry
Readings:
• Chapter 3.1-5,8,9 MML Textbook

Norms
➢A norm is a scalar measure of a vector’s length.
➢The most important norm is the Euclidean norm and for 𝑥 ∈
R𝑛 is defined as:
𝑛
𝑥2≔ ෍𝑥2= 𝑥𝑇𝑥 𝑖
𝑖=1
computes the Euclidian distance of x from the origin.
Euclidean norm is also known as the L2 norm
60

Norms
➢For different norms, the red lines indicate the set of vectors with norm 1.
1
𝑥1=1
1
1 𝑥2=1
Manhattan norm
Euclidean distance
1
61

Dot product
➢Dot product:
𝑎 ∙𝑏 = 11
𝑛
𝑥𝑇𝑦 = ෍𝑥𝑖𝑦𝑖 1 3 𝑖=1
∙ = 1∙3 + 7∙5 =38 75
➢Commonly, the dot product between two vectors a, b is denotedby𝑎𝑇𝑏or 𝑎,𝑏.
62

Lengths and Distances
➢Consider an inner product space. ➢ Then
𝑑𝑥,𝑦≔𝑥−𝑦= 𝑥−𝑦,𝑥−𝑦 is called the distance between x and y for 𝑥, 𝑦 ∈ 𝑉.
➢If we use the dot product as the inner product, then the distance is called Euclidean distance.
63

Angles
➢ The angle 𝜽 between two vectors 𝒙, 𝒚 is computed using the inner product. ➢ For Example: Let us compute the angle between
𝑥 = [1,1]𝑇∈ R2 and 𝑦 = [1,2]𝑇∈ R2
➢ Using the dot product as the inner product we get:
cos𝜃= 𝑥,𝑦 =𝑥𝑇𝑦=3 𝑥,𝑥 𝑦,𝑦 𝑥𝑇𝑥𝑦𝑇𝑦 10
y
1
➢ Then the angle between the two vectors is cos−1( 3 ) ≈ 0.32𝑟𝑎𝑑 , which corresponds to
0
𝜃x
1
10 approximately 18°.
64

Orthogonality
➢Orthonormal = Orthogonal and unit vectors
➢Orthogonal Matrix: A square matrix 𝐴 ∈ R𝑛×𝑛 is an orthogonal
matrix if and only if its columns are orthonormal so that 𝐴𝐴𝑇 = 𝐼 = 𝐴𝑇𝐴,
➢which implies that
i.e., the inverse is obtained by simply transposing the matrix.
𝐴−1 = 𝐴𝑇,
65

Orthonormal Basis
➢In n-dimensional space, we need n basis vectors that are linearly independent, if these vectors are orthogonal, and each has length 1, it’s a special case: orthonormal basis
➢ Consider an n-dimensional vector space V and a basis 𝑏,𝑏 =0𝑓𝑜𝑟𝑖≠𝑗
𝑏 , … , 𝑏 1𝑛
of V. If
𝑖𝑗
𝑏𝑖,𝑏𝑖 =1
for all 𝑖, 𝑗 = 1, … , 𝑛 then the basis is called an orthonormal basis (ONB). Note that 𝑏𝑖 , 𝑏𝑖 = 1 implies that every basis vector has length/norm 1.
➢ If only 𝑏 , 𝑏 = 0 𝑓𝑜𝑟 𝑖 ≠ 𝑗 is satisfied, then the basis is called an orthogonal 𝑖𝑗
basis.
66

Orthonormal Basis
➢The canonical/standard basis for a Euclidean vector space R𝒏 is an orthonormal basis, where the inner product is the dot product of vectors.
➢Example: In R2, the vectors: 𝑏=11,𝑏=11,
1 21 2 2−1
form an orthonormal basis since 𝑏𝑇𝑏 = 0 and 𝑏 = 1 = 𝑏 1212
.
67

Orthogonal Projections
➢ Projections are linear transformations, project to lower dimensional feature space 68

Orthogonal Projections
➢ The projection is defined
𝒃𝑇𝒙 𝒃𝒃𝑇
𝜋𝑈 𝒙 = 𝜆𝒃 = 𝒃 𝒃 2 = 𝒃 2 𝒙
69

Example: Orthogonal Projections
1
0
b
1
x
➢ Compute the projection of 𝑥
= [1,2]𝑇 ∈ R2
onto 𝑏
= [1,1]𝑇∈ R2
𝒃𝒃𝑇 𝜋𝑈 𝒙 = 𝒃 2 𝒙
70

Projection Matrix
➢ We can also use a projection matrix, which allows us to project any vector x onto the subspace defined by 𝜋.
𝑃= 𝜋
➢ Note that 𝒃𝒃𝑇 will be a symmetric matrix
𝜋𝑈𝒙= 𝒙
𝒃𝒃𝑇
2
𝒃
𝒃𝒃𝑇
2
𝒃
71

Example: Applying Projection Matrix
➢Compute the projection matrix for 𝑏
= [1,1]𝑇∈ R2
𝒃𝒃𝑇
𝑃= 𝜋
1
𝒃
2
0
b
1
72

Part 3
Data Augmentation

Non-Representative Data
➢Everything our algorithms learn comes form the data used to train them.
➢If the data is of poor quality, unbalanced or not representative of the task we want to solve, then how are our algorithms going to learn to generalize?
74

Capacity and Training
➢Deep learning algorithms have the capacity to classify real images in various orientations and scales.
➢If you train your algorithms on perfectly processed samples, then they won’t know how to predict anything but perfectly cropped images.
75

Data Augmentation
➢Use linear algebra to perform common transformations to supplement datasets
➢ Translation, Scaling, Rotation, Reflection ➢ Noise, Light and Colour Intensity
➢ Many more…
GAN Fake Celebrities
Source: Viridian Martinez
Source: kaggle.com
76
➢ Advanced:
➢Generative models (i.e., Deep learning) to create new images with similar characteristics

Test Time Data Augmentation
➢You can also apply data augmentation to better evaluate your performance on test examples.
➢Great way to assess limitations of your model to images of different rotations, scales, noise, etc.
77

Next Time
➢Oct 7th: Week 5 Tutorial 2 on Anomaly Detection ➢Project 2 Overview
➢Oct 11-15 Reading Week (Q&A Session on Thursday Oct 14th, no lecture) ➢Oct 18-19 Midterm (Q&A Session on Thursday Oct 21st, no lecture)
➢Project 2 is due on Oct 22nd
➢Lecture 7 – Dimensionality Reduction (Oct 26th) ➢Curse of Dimensionality
➢ Eigendecomposition ➢Singular Value Decomposition ➢Principle Component Analysis
78

Google Colab