留学生作业代写 Dimensionality Reduction

Dimensionality Reduction
Introduction

Illustrate the need for dimensionality reduction

What is Dimensionality Reduction?
|We have N data points in a high-dimensional space,
-e.g., in the order of tens of thousands of dimensions. |We want to project them into some low-
dimensional space,
-e.g., in the order of tens of dimensions.
|Why dimensionality reduction?
-A key technique to mitigate curse of dimensionality

The Curse of Dimensionality
|Consider histogram as a density estimator.
|Exponentially more samples would be needed in higher- dimensional spaces for the same “resolution”.

Many Techniques for Dimensionality Reduction
|Many ways for going from a higher-dimensional space to a lower-dimensional space.
-Feature Selection achieves this by keeping only a subset of the original features/dimension.
|There are many other techniques, employing a feature mapping/projection approach.
-New features are generated (instead of selecting only from the original features).
-The underlying assumptions and/or goals of the techniques are often different.

Dimensionality Reduction
Principal Component Analysis: Basic Idea

Illustrate the basic idea of Principal Component Analysis

Principal Component Analysis: Basic Idea
|Look at a simple 2-D to 1-D example: we want to use a single feature to describe the 2-D samples
|Consider these possibilities
-Naïve: randomly discard one dimension
-Better: discard the less-descriptive one (x2 in the figure)
-Much better: project the data to a most-descriptive direction and use the projections.

How to Formulate this Idea?
|“Most descriptive” ≈ Largest “variance” |So the problem is to find the direction of the
largest variance.
Given n samples D={x1, x2,…, xn} in d-dimensional space, find a direction e1, such that the projection of D onto e1 gives the largest variance (compared with any other direction).
e1 is a d-dimensional vector with unit norm.

|Let’s compute the variance of the projected data on a given direction e.
-The n projected samples are given as, for i = 1, …, n, -The mean of the projections:
-Thus the (sample) variance of the projections:

Find e1 (cont’d)
|Expand the previous expression
|C is the sample covariance matrix.

Find e1 (cont’d)
|To find e1, we can do
|Constrained maximization: use Lagrange multiplier method.
what if without this constraint?
Lagrange multiplier

Find e1 (cont’d)
|Set the partial derivative to 0, we have →
|The solution is an eigenvector of C, with eigenvalue λ, which is also the variance under e:
|We should set e1 to be the eigenvector corresponding to the largest eigenvalue λ1.

Recap of the Key Idea
|We want to project the given data samples to certain direction so that the variance is maximized, compared with any other direction.
|We figured out what this optimal direction e1 should be:
– It should be the eigenvector of corresponding to the largest eigenvalue λ1, of the covariance matrix.

Dimensionality Reduction
Principal Component Analysis: The Algorithm & Important Properties

Implement the PCA algorithm
Discuss some important properties of PCA

Principal Components
|We found e1, which gives the direction of the largest variance after projection
-The first principal component.
|The process can be continued in the subspace
orthogonal to e1, and so on and so forth.
-Obtaining other principal components: e2, e3 , etc., corresponding to other eigenvectors of C, ordered by the corresponding eigenvalues λi

Principal Components (cont’d)
|The principal components are orthogonal to each other → {ei} forms an orthonormal basis in the d- dimensional space.
|The total variance is given by the sum of the variances of the projections.

How Many Principal Components to Keep?
|To reduce dimensions, we will need to keep only d’ << d projections. |We can measure how much of the total variance a d’-dimensional subspace captures, by the ratio |Variance may be related to the “energy” of a signal: how accurately we want to represent the data. -The ratio can be used to guide in choosing a proper d’ for desired accuracy. The PCA Algorithm 1. Compute the dxd sample covariance matrix C 2. Find the eigenvalues and corresponding eigenvectors of C 3. Project the original data onto the space spanned by the eigenvectors – The projection may be done onto a d’-dimensional subspace spanned by the first d’ eigenvectors (ordered by the eigenvalue in descending order) • d’ is determined by the desired accuracy Important Properties of PCA | PCA represents the data in a new space, in which the components of the data is ordered by their “significance”. -Dimension reduction can be done by simply discarding less significant dimensions. |Linearity assumption➔extensions exist |“Variance ≈ Importance” is meaningful only under large signal-to-noise ratio PCA as Feature Mapping | When we use only d’ dimensions from PCA (with original dimension d > d’), this may look like feature section.
-But in general they are different approaches. |PCA
-Unsupervised (in general)
-Generates new features (linear combination of original
|Feature Selection -Supervised (in general)
-Selects a few original features (e.g., for better classification)

Can PCA Help Classification?
|Can we do better classification in a lower- dimensional space from d’ principal components given by PCA?
-Not necessarily.
|LDA may be better posed for such a task.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts