Foundations of Data Analytics and Machine Learning
Summer 2022
• Projections
• MatrixDecompositions • PCA
Copyright By PowCoder代写 加微信 powcoder
Assessment Logistics
Ø Platform: Crowdmark
Ø available right now! Would not be mark. Just to get familiar. Ø 0 mark for misplaced Questions.
Ø Practice Problems + Solutions on Quercus.
Ø Material: Week 1 to end of week 6 (this week), Tutorial 1 and 2, Project 1 and 2.
Ø Assessment Distribution: Jul 5th at 10 AM EST Ø Force Collection: Jul 6th at 10 AM EST
o Duration : 1.5hr Exam + 0.5hr Uploading + 0.5 any kind of contingencies = 2.5hr from start or Force collection (whichever happens sooner)
o Example: If you start the exam on Jul 6th at 9 AM, you have only an hour to submit the exam (Force collection). Ø Late submission: -10% per minute. 0 mark after 10 minutes.
Ø Use course material + online resources – NO HELP FROM OTHERS!!
Ø Async Exam: No help, no announcement, no correction to be fair.
Ø If needed, make assumptions and answer to the questions,
Ø Start the exam ASAP
Ø Email Ali or Chris with any logistic issues ASAP – no guarantee we could get back to you on time. Ø No Piazza during assessment.
ØLooked at linear algebra for data processing in the form of Data Augmentation
Øscalars, vectors and matrices Øsolving systems of equations
Øchange of basis
Øanalytical geometry
Ørotations and other transformations
ØToday we will examine data processing in the form of Dimensionality Reduction
High-Dimensional Data
ØReal world data is often high-dimensional! ØChallenge: difficult to analyze, visualize and interpret
Curse of Dimensionality
As the number of feature or dimensions grows:
ØMost samples have the same distance from each other.
Ø The amount of data we need to generalize accurately grows exponentially.
Curse of Dimensionality
Why corners are bad?
Curse of Dimensionality
Most of spaceèCorners
ØMajority of the volume is outside of the sphere in higher dimensions
Space is Mostly Corners
Curse of Dimensionality
ØMany dimensions are unnecessary
ØData often lives on a low-dimensional manifold ØDimensionality reduction finds the relevant dimensions
Ø Projections
ØMatrix Decompositions
ØDeterminant
ØEigenvalues and Eigenvectors ØEigendecomposition
ØDimensionality Reduction
Dimensionality Reduction Part 1
Recap: Linear Algebra
ØGiven A and b, we want to solve for x: 𝐴𝑥=𝑏 2−1𝑦𝑥=1
ØThis can be given several interpretations: ØBy rows: x is the intersection of hyper-planes:
2𝑥 − 𝑦 = 1 𝑥+𝑦=5
ØBy columns: x is the linear combination that gives b: 𝑥 2 +𝑦 −1 = 1
ØTransformation: x is the vector transformed to b: 𝑇𝑥=𝑏
Recap: Analytical Geometry
ØProjections are linear transformations that project to lower dimensional feature space
Recap: Orthogonal Projections
Project vector onto another vector
Now let’s consider the case of projecting a vector onto a plane (i.e. hyperplane)?
Coordinate
Coordinates
𝜋! 𝒙 = 𝑩𝜆 = 𝑩 𝑩𝑻𝑩
𝒃”𝒙 𝒃𝒃” 𝜋! 𝒙 = 𝜆𝒃 = 𝒃 𝒃 # = 𝒃 # 𝒙
𝑩𝑻𝑩𝜆 = 𝑩𝑻𝒙 𝜆= 𝑩𝑻𝑩%𝟏𝑩𝑻𝒙
𝑩𝑻𝑩𝜆 = 𝑩𝑻𝒙 𝜆= 𝑩𝑻𝑩%𝟏𝑩𝑻𝒙
𝑩𝑻𝑩𝜆 = 𝑩𝑻𝒙 𝜆= 𝑩𝑻𝑩%𝟏𝑩𝑻𝒙
Example con’t
𝜋! 𝒙 =𝑩𝜆=𝑩 𝑩𝑻𝑩 #𝟏𝑩𝑻𝒙
Example con’t
𝜋! 𝒙 =𝑩𝜆=𝑩 𝑩𝑻𝑩 #𝟏𝑩𝑻𝒙
Example con’t
𝜋! 𝒙 =𝑩𝜆=𝑩 𝑩𝑻𝑩 #𝟏𝑩𝑻𝒙
Example con’t
𝜋! 𝒙 =𝑩𝜆=𝑩 𝑩𝑻𝑩 #𝟏𝑩𝑻𝒙
If we have orthonormal bases (B), then this simplifies the calculation of the inverse
𝜋! 𝒙 =𝑩𝜆=𝑩𝑩𝑻𝒙
“how to summarize matrices, how matrices can be decomposed, and how these decompositions can be used for matrix approximations”
Matrix Decompositions
• Chapter 4.1-5 MML Textbook
Decomposing a Matrix – 1
ØThere are many ways we can describe a matrix ØExample, take a look at the following transformation:
100 510 001
ØNotice that we could write the same thing in terms of several transformations:
100100000 A=510=010+500 001001000
Decomposing a Matrix – 2
ØThere are many ways we can describe a matrix ØExample, take a look at the following transformation:
ØNotice that we could write the same thing in terms of several transformations:
A= 2 −1 = 1 −1 2 0 121102
Decomposing a Matrix
ØThis can be used for providing clarity on what the transformation is doing
ØSimplifying and speeding up computations ØDimensionality reduction as we’ll see later
Key Concepts
qDeterminant qTrace qDiagonalization qEigenvectors qEigenvalues
Determinant
ØThe determinant of a square matrix is a function that maps A onto a real number.
ØSimple geometric meaning. Øcalculates how volume grows/shrinks
under a linear transformation/mapping Øfor example:
Two vertical bars indicate determinant of matrix enclosed inside
det(A) = 6
Determinant for Matrix Invertibility
ØWhat does it mean when a matrix is not invertible?
ØSimple geometric meaning. Øtransformation to a subspace
Øhow does the volume grows/shrinks? Øfor example:
det(A) = 0
If det(A) = 0, then matrix A is singular.
Determining the Determinant
ØThe determinant can only be calculated on square matrices (n x n): Ø Determinant in 2d:
𝑎,, 𝑎,# 𝑎#, 𝑎##
Ø Determinant in 3d:
= 𝑎,,𝑎## − 𝑎,#𝑎#,
𝑎,, 𝑎,# 𝑎,- 𝑎## 𝑎#- 𝑎#, 𝑎#- 𝑎#, 𝑎## 𝑎#, 𝑎## 𝑎#- =𝑎,, 𝑎-# 𝑎– −𝑎,# 𝑎-, 𝑎– +𝑎,- 𝑎-, 𝑎-# 𝑎-, 𝑎-# 𝑎–
= 𝑎,,𝑎##𝑎– + 𝑎#,𝑎-#𝑎,- + 𝑎-,𝑎,#𝑎#- −𝑎-,𝑎##𝑎,-−𝑎,,𝑎-#𝑎#- −𝑎#,𝑎,#𝑎–
recursively take alternating sums of sub-determinants
Determining the Determinant
ØCan you think of an easier way?
ØHint: what is the determinant of the following
upper triangular matrix?
det (A) = 𝑎,,𝑎##𝑎– + 𝑎#,𝑎-#𝑎,-+ 𝑎-,𝑎,#𝑎#-
– 𝑎-,𝑎##𝑎,- – 𝑎,,𝑎-#𝑎#- – 𝑎#,𝑎,#𝑎–
= 𝑎,,𝑎##𝑎– ØQ: How can we obtain a triangular matrix?
𝑎,, 𝑎,# 𝑎,-
A= 0 𝑎## 𝑎#- 0 0 𝑎–
Determining the Determinant
ØApply Gaussian Elimination to bring matrix into row Echelon form.
ØThen determinant comes directly from the diagonal: 𝑎,, 𝑎,# 𝑎,-
0 𝑎## 𝑎#- 0 0 𝑎–
ØThere are three things to consider:
1. Row addition has no effect on determinant
2. Each row switching operation changes sign of determinant
3. Scaling a row scales the determinant
det(A) = 4 𝑎%% %
for triangular matrix
ØThe trace of a square matrix is the sum of the diagonal elements.
ØIt is defined as: $ 𝐭𝐫(A)= 0𝑎
𝑎%# 𝑎%% ⋮ ⋮
ØHas some interesting identities that can help simplify equations:
§𝐭𝐫 𝑐A+𝑑B =𝑐𝐭𝐫 A +𝑑𝐭𝐫(B) §𝐭𝐫 A6B =𝐭𝐫 AB6 =𝐭𝐫 B6A
§𝐭𝐫ABC =𝐭𝐫ACB =𝐭𝐫(BCA)
Assuming that the dimensions work
𝑎&# 𝑎&% … 𝑎&$
Eigenvectors
ØVectors that don’t change their direction due to a transformation and can be represented as follows:
Eigenvector(s) of A
𝐴𝑥=𝑎 𝑐 𝑥#=𝜆𝑥#
eigenvector
Eigenvectors
ØVectors that don’t change their direction due to a transformation and can be represented as follows:
eigenvector
Eigenvector(s) of A
not v eigenvector
ØDepending on the dimension there may be multiple independent vectors that satisfy the equation (eigenvectors).
ØWhere 𝜆 is a scalar referred to as eigenvalue.
Non-Uniqueness of Eigenvectors
ØIf x is an eigenvector of A with corresponding eigenvalue 𝜆 then it holds that cx is also an eigenvector of A:
A 𝑐x =𝑐Ax=𝑐𝜆x=𝜆(𝑐x)
not v eigenvector
cx eigenvector eigenvector
ØAll vectors that are collinear to x are also eigenvectors of A with the same eigenvalue 𝜆
Finding Eigenvectors
ØWe can find the characteristic polynomial to find eigenvalues:
We can describe a matrix A as a polynomial
= 𝑐7𝜆 + 𝑐,𝜆# + ⋯ + 𝑐8%,𝜆8%, + −18𝜆8
𝑝! 𝜆 =𝐝𝐞𝐭(A−𝜆I)
𝑎,# 𝑎## − 𝜆
𝑎#- 𝑎– − 𝜆
solve for roots of the polynomial to obtain eigenvalues and eigenvectors!
Ax = 𝜆x Ax − 𝜆x = 0 (A − 𝜆I)x = 0
ØThere are a few theorems worth mentioning:
1. 𝜆9 is an eigenvalue if and only if 𝜆9 is a root of a characteristic
polynomial.
2. eigenvectors of a square matrix (n x n) with n distinct
eigenvalues are linearly independent and form a basis of Rn.
3. The determinant of a square matrix (n x n) is the product of
its eigenvalues.
𝐝𝐞𝐭(A)= 4𝜆% %
4. The trace of an n x n matrix is the sum of its eigenvalues. &
𝐭𝐫(A)= <𝜆% %'(
Example 4.5
Graphical Intuition - Scaling
ØThe vertical axis stretched by a factor of 2 and the horizontal axis is compressed by a factor of 1/2. The mapping is area preserving.
transformed
Graphical Intuition – Sheer and Stretch
1 1/2 1/2 1
ØRepresents a shear and stretch mapping. It stretches along the red eigenvector by 1.5 and compresses along blue eigenvector by 0.5. That area is scaled by 75%.
transformed
Graphical Intuition – Collapse to Subspace
ØRepresents a mapping that collapses a two- dimensional domain onto one dimension. The eigenvalue 𝜆#= 0 collapses while the other stretches the space by 2. The area in this case is 0.
transformed
Graphical Intuition - Rotation
cos(𝜋/6) sin(𝜋/6)
−sin(𝜋/6) = cos(𝜋/6)
ØThe matrix rotates the points by 𝜋/6 rad (30 degrees) counter clockwise. The eigenvalues are complex, so cannot draw them on real axis. The rotation is area preserving.
transformed
Eigendecomposition
ØA square matrix (n x n) can be factored into: A = 𝑃𝐷𝑃"#
where P is Rnxn and D is a diagonal matrix whose diagonal entries are eigenvalues of A (if and only if the eigenvectors form the basis of Rn)
ØA square symmetric matrix (n x n) the eigenvectors form an orthonormal basis (with real eigenvalues) and A can be factored into:
Spectral Theorem
Geometric Intuition
ØEigendecomposition can be though of as a sequence of transformations:
1. P-1 performs a basis change depicted as a rotation-like operation from standard basis into the eigenbasis
2. D performs a scaling along the remapped orthogonal vectors depicted by a circle being stretched
3. P undoes the basis change depicted as a reverse rotation and restores the original coordinate frame.
Example 4.11
Principal Component Analysis
• Chapter 10 MML Textbook
Assume we have a dataset with 1000 features.
Can we somehow mix these features to create some new features,
such that we only need 20 of these new feature to represent our data?
ØPrinciple component analysis (PCA) is one example where eigendecomposition is used frequently
ØDecomposing data in terms of factors has many benefits:
§ data interpretation
§ dimensionality reduction
You can think of it as projections that maximize the variance in the data (i.e. information content).
These slides are adapted from
https://www.youtube.com/watch?v=FgakZw6K1QQ
Variables!
We can plot 2 variables in 2D.
Shift the data
Draw a line crossing the origin
Goal: Fit the data as best as a line can
How do we know we have the best fit?
How we know the best fit?
Objective: Minimize b or Maximize c
Maximize the Sum of squared distances
First Principal Component
First Principal Component
The slope of the PC1 = 0.25
Sqrt (16+1)=4.12 4
First Principal Component
• The slope of the PC1 = 0.25
Eigenvector for PC1 = [0.97, 0.242]
Sqrt (16+1)=4.12 4
Other name: Singular vector
First Principal Component
• The slope of the PC1 = 0.25
Sum of squared distances for PC1 = Eigenvalue for PC1
Sqrt(Eigenvalue for PC1) = Singular Value for PC1
Sqrt (16+1)=4.12
Projection - PC1 and PC2
Reconstruction
Reconstruction
Scree Plot
What happens beyond 2D?
ØRigidly rotate the coordinate axes to new (principal axes) such that:
—principal axis 1 corresponds to the highest variance,
—axis 2 has the next highest variance,
—and axis p has the lowest
—All principal axes are uncorrelated (orthogonal).
Source: Václav Hlaváč
In Summary...
Source: Stackoverflow
ØProject D-dimensional data onto an M-dimensional subspace that
retains as much information as possible ØInformally: information = diversity = variance
PCA Algorithm Overview
1. Standardize the data (zero mean) and compute the Covariance Matrix
2. Obtain the Eigenvectors and Eigenvalues
3. Choose the 𝑘 eigenvectors that correspond to the 𝑘 largest eigenvalues
4. Construct the projection matrix 𝐖 from the selected 𝑘 eigenvectors.
5. Transform the original dataset 𝐗 via 𝐖 to obtain a 𝑘-dimensional feature subspace 𝐘.
interpretation of data by examining principal components
discard components for
dimensionality reduction
Visualization of Principle Components
ØPCA is commonly applied to datasets to assess information content.
Ø Example:
ØPCA applied on patient dataset and samples projected on top principal components.
ØPatients with known treatment outcomes are coloured identified (responders vs non-responders).
Source: Khodayari-Rostamabad (2011)
Dimensionality Reduction
ØProjecting onto principal components can improve model/algorithm performance.
ØReduces the number of parameters and required complexity
Ø Example:
ØPC1 captures most of the variance
(information).
ØPC2 limited variance and projections onto PC2 would look similar (or the same) for all samples.
What about Images?
ØYes, we can use PCA! (Turk, Pentland, 1991)
ØEach 150 pixel × 150 pixel image can be represented by a 22,500-long vector
Normalized Face Data
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
The Eigenfaces
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
Eigenvalues for each Eigenface
00 10 20 30 40 50 60
Sorted Principle Components
Eigenvalues
Reconstructing a Face from PCs
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
5555555555 10 10 10 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 20 20 20 20 20 20 20 20 20 20 25 25 25 25 25 25 25 25 25 25
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
Faces in PC-space (Dims 1,2)
Are there any patterns presented?
-80 -60 -40 -20 0 20 40 60 80
Brightness
Faces in PC-space (Dims 59,60)
PC60 0 10 20 30
-80 -60 -40 -20 0 20 40 60 80
Dimension 59,60 have low information content
Everyone ends up being stacked on top of each other
Data Security
ØTo reconstruct data we require the mean and standard deviation used for normalization
ØCan be used to secure your datasets as described for a sample dataset on Kaggle:
“It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only fea
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com