CS计算机代考程序代写 matlab algorithm Dimensionality Reduction I: Principle Component Analysis

Dimensionality Reduction I: Principle Component Analysis

ISML_5: Dimensionality Reduction

Lingqiao Liu

Outline

• Overview of dimensionality reduction

– The benefits

– Why we can perform dimensionality reduction

– Type of dimensionality reduction methods

• Principal component analysis

– Mathematical basics

– Decorrelation and dimension selection

– Eigenface and high-dimensionality issue

• Linear Discriminative Analysis

• Summary

What is dimensionality reduction

• Reduce dimensions of data

n

p

A n

k

X

Dimensionality Reduction: why?

• Extract underlying factors

Dimensionality Reduction: why?

• Reduce data noise

– Face recognition

– Applied to image de-noising

Image courtesy of Charles-Alban Deledalle, Joseph Salmon, Arnak Dalalyan; BMVC 2011
Image denoising with patch-based PCA: local versus global

Dimensionality Reduction: why?

• Reduce the number of model parameters

– Avoid over-fitting

– Reduce computational cost

Dimensionality Reduction: why?

• Visualization

Dimensionality Reduction

• General principle:
 Preserve “useful” information in low dimensional data

• How to define “usefulness”?
– Many
– An active research direction in machine learning

• Taxonomy
– Supervised or Unsupervised
– Linear or nonlinear

• Commonly used methods:
– PCA, LDA (linear discriminant analysis), and more.

• Feature Selection vs dimensionality reduction

Outline

• Overview of dimensionality reduction

– The benefits

– Why we can perform dimensionality reduction

– Type of dimensionality reduction methods

• Principal component analysis

– Mathematical basics

– Decorrelation and dimension selection

– Eigenface and high-dimensionality issue

• Linear Discriminative Analysis

• Summary

Principal Component Analysis: Motivation

• Principal Component Analysis (PCA) is an unsupervised
dimensionality reduction method
– Transform data to remove redundant information

– Keep the most informative dimensions after the transformation

Data correlation & information
redundancy

PCA explained: De-correlating data

• Dependency vs. Correlation

 Dependent is a stronger criterion

• Equivalent when data follows Gaussian distribution

• PCA only de-correlates data

 One limitation of PCA

 ICA, but it is more complicate

PCA explained: De-correlating data

• Geometric interpretation of correlation

Exercise: which figure shows the
highest data correlation?

Exercise: which figure shows the
highest data correlation?

PCA explained: De-correlating data

• Correlation can be removed by rotating the data
point or coordinate

PCA explained: SNR maximization

• Maximize

Data matrix

Signal Noise

Data matrix

Transform

PCA explained: SNR maximization

Signal

Noise

Discussion: why we think a dimension
with larger covariance contain more
information?

PCA explained

• Target
– 1: Find a new coordinate system which makes

different dimensions zero correlated

– 2: keep the top-k dimensions with largest variance in
the new coordinate system

• Method
 Rotate the data point or coordinate

• Mathematically speaking…
– How to rotate?

– How to express our criterion

Mathematic Basics

• Mean, Variance, Covariance

• Matrix norm, trace,

• Orthogonal matrix, basis

• Eigen decomposition

Mathematic Basics

• (Sample) Mean

• (Sample) Variance

• (Sample) Covariance

Mathematic Basics

• Covariance Matrix

Or compactly

is the centralized data matrix (with mean subtracted)

When C is a diagonal matrix, dimensions of features become zero-correlated

Mathematic Basics

• Orthogonal matrix

• Rotation effect

Mathematic Basics

• Relationship to coordinate system

– A point = linear combination of bases

– Combination weight = coordinate

• Each row (column) of Q = basis

– Not unique

– Relation to coordinate rotation

• New coordinate

Mathematic Basics

(2,2)

Old coordinate

New coordinate

Mathematic Basics

Eigen-decomposition

 If is symmetric

PCA: solution

• Target 1: de-correlation

Original covariance matrix of data

We are looking for transform the centralized
data with a matrix P

Require be a diagonal matrix

PCA: solution

• Target 1: de-correlation

Original covariance matrix of data

We are looking for transform the centralized
data with a matrix P

Require be a diagonal matrix

Sounds familiar?

PCA: solution

• Target 1: de-correlation

Original covariance matrix of data

We are looking for transform the centralized
data with a matrix P

Require be a diagonal matrix

Sounds familiar? Use eigen decomposition

set

so

Recap: Matrix decomposition

• Matrix can be decomposed into the combination (usually
product) of special matrices

• Eigen decomposition

– When A is symmetric, i.e.

• Related topic: Singular value decomposition

University of Adelaide 30

A closer look at the covariance matrix of y

• The covariance matrix of the transformed data is a
diagonal matrix, what are the diagonal elements?

• From the perspective of covariance matrix

• From the perspective of Eigen-decomposition, the
diagonal matrix corresponding to the eigenvalues of

The i, j th element of the covariance matrix is the covariance between the i-th
dimension feature and j-th dimension feature. The i-th diagonal element
represents the variance of the feature at the i-th dimension

How to select the most informative
dimensions?

• We rank the dimensions by their variance, which is
equivalent to rank the dimensions by their
corresponding eigen values

– Note that the i-th dimension of y is obtained by using the i-th
row of P

– If we only want to have k dimensions, we can simply select the
eigen vectors corresponding to the top-k eigenvalues to form the
projection matrix P

PCA: algorithm

• 1. Subtract mean

• 2. Calculate the covariance matrix

• 3. Calculate eigenvectors and eigenvalues of the
covariance matrix

• 4. Rank eigenvectors by its corresponding
eigenvalues

• 4. Obtain P with its row vectors corresponding to the
top k eigenvectors

PCA: MATLAB code

Note fea is with the size of N x d.

PCA: reconstruction

• From a new feature y, we can reconstruct its
original feature x

• Fun fact, we can also derive PCA algorithm
from the following optimization problem

Similar to the rotation back operation, e.g.,

PCA: reconstruction

• Reighley Quotient

• Solution = PCA

Application: Eigen-face method

• Sirovich and Kirby (1987) showed that PCA could be
used on a collection of face images to form a set of
basis features.

• Not only limited to face recognition

• Steps

– Image as high-dimensional feature

– PCA

Application: Eigen-face method

Application: Reconstruction

Reconstructed from top-2 eigenvectors

Application: Reconstruction

Reconstructed from top-15 eigenvectors

Application: Reconstruction

Reconstructed from top-40 eigenvectors

Application: Eigen-face method

• From large to small eigenvalues

Common
Patterns

Discriminative
Patterns

Noisy
Patterns

high dimensionality issue

• For high-dimensional data large

• The number of samples is relatively small,
can be too large

• Solution: a useful relation

Suppose , we first consider the eigen value and vectors of the matrix

Multiply X on both side of equation

If we define

Then we know

high dimensionality issue

• For high-dimensional data large

• The number of samples is relatively small,
can be too large

• Solution: a useful relation

Suppose , we first consider the eigen value and vectors of the matrix

Multiply X on both side of equation

If we define

Then we know
The definition of the eigen system
of covariance matrix

Kernel
matrix

High dimensionality issue

• 1. Centralize data

• 2. Calculate the kernel matrix

• 3. Perform Eigen-decomposition on the kernel matrix and
obtain its eigenvector

• 4. Obtain the Eigenvector of the covariance matrix by

• Question? How many eigenvectors you can obtain in this way?

Mathematic Basics

• Rank of a matrix
• In linear algebra, the rank of a matrix A is the dimension of the

vector space generated (or spanned) by its columns

• It is identical to the dimension of the vector space spanned by

its rows

• Important relationship

• Relationship to the eigenvalues
the rank of a matrix equals to the number of non-zero
eigenvalues of the matrix.

Back to the previous question

• So the rank of

• So at most meaningful projections from PCA

The eigen vector corresponding to 0 eigenvalue is meaningless
in PCA

Outline

• Overview of dimensionality reduction

– The benefits

– Why we can perform dimensionality reduction

– Type of dimensionality reduction methods

• Principal component analysis

– Mathematical basics

– Decorrelation and dimension selection

– Eigenface and high-dimensionality issue

• Linear Discriminative Analysis

• Summary

Discriminative dimensionality reduction

• General principle:

 Preserve “useful” information in low dimensional data

 PCA: measure “usefulness’’ through reconstruction
error or covariance structure.

 Useful for reconstruction ≠ useful for classification

• General principle for discriminative
dimensionality reduction

 Preserve “discriminative” information in low
dimensional data

Linear Discriminant Analysis: Basic
idea

• Linear Discriminant Analysis (LDA)

– Discriminative dimensionality reduction

– Linear dimensionality reduction

• Supervised information

– Class label

– Data from the same class => Become close

– Data from different classes => far from each other

Linear Discriminant Analysis (LDA): Objective

• Two classes:

Linear Discriminant Analysis: Objective

• Two classes:

Linear Discriminant Analysis (LDA): Objective

• Mean after projection:

• Variance after projection:

Linear Discriminant Analysis (LDA),
Objective

• Mean distance

• Between class Scatterness, within class Scatterness

Linear Discriminant Analysis (LDA):
Solution

• Objective

• Solution

Linear Discriminant Analysis (LDA),
Solution

• Eigenvectors corresponding to the largest
eigenvalues of

– Why?

• Implementation details
– What if is not invertible?

Use instead

At optimum, we have

Linear Discriminant Analysis, Multi-
class

• Generalized to multiple classes

• Solution:

– Top c eigenvectors of

– Discussion: how many projections you can have?

Linear Discriminant Analysis, Multi-
class

• Generalized to multiple classes

• Solution:

– Top c eigenvectors of

– Discussion: how many projections you can have?

Check the rank of
At most C projections!

Summary

• The benefit of dimensionality reduction

• Principal component analysis

– Two basic steps: decorrelation and select the dimensions with
top variances

– How to perform data transformation?

– Its application, eigen face. The meaning behind eigen vectors

– High dimensionality issue

• Linear discriminative component analysis

– Basic objective

– Two-class case

– Multi-class case