程序代写代做代考 algorithm Lecture 2:

Lecture 2:

Lecture 8

Scalable PCA/SVD
Dimensionality Reduction & Factor Analysis

Haiping Lu
http://www.dcs.shef.ac.uk/~haiping

COM6012 Scalable Machine Learning
Spring 2018

13/04/2018 1Haiping Lu – University of Sheffield

Week 8 Contents

• Unsupervised Learning

• PCA – Dimensionality Reduction

• SVD – Factor Analysis

• Scalable PCA in Spark

13/04/2018 Haiping Lu – University of Sheffield 2

Unsupervised Learning

13/04/2018 Haiping Lu – University of Sheffield 3

Supervised methods

predict
our data

as a function of
other data

y = f (X)
Unsupervised methods

find structure in the
data on its own

f (X)

Three Topics

• Principal component analysis (PCA) & SVD
• Dimensionality reduction & factor analysis

• K-means
• Clustering

• Matrix factorisation (with missing information)
• Collaborative filtering Recommender system

• Scale these algorithms for big data

13/04/2018 Haiping Lu – University of Sheffield 4

Week 8 Contents

• Unsupervised Learning

• PCA – Dimensionality Reduction

• SVD – Factor Analysis

• Scalable PCA in Spark

13/04/2018 Haiping Lu – University of Sheffield 5

Dimensionality Reduction

13/04/2018 Haiping Lu – University of Sheffield 6

• Assumption: Data lies on or near a low
d-dimensional subspace

• Axes of this subspace are effective representation of
the data

Why Reduce Dimensions?

13/04/2018 Haiping Lu – University of Sheffield 7

Why reduce dimensions?
• Discover hidden correlations/topics

• Words that occur commonly together
• Remove redundant and noisy features

• Not all words are useful
• Interpretation and visualization
• Easier storage and processing of the data

Dimensionality Reduction

• Raw data is complex and high-dimensional

• Dimensionality reduction describes the data using a
simpler, more compact representation

• This representation may make interesting patterns in
the data clearer or easier to see

13/04/2018 Haiping Lu – University of Sheffield 8

Dimensionality Reduction

• Goal: Find a ‘better’ representation for data

• How do we define ‘better’?
• For example

• Minimise reconstruction error
• Maximise variance
• They give the same solution  PCA!

13/04/2018 Haiping Lu – University of Sheffield 9

PCA Algorithm

• Input: N data points, each  D-dimensional vector
• PCA algorithm

• 1. X0 Form N × D data matrix, with one row vector xn
per data point

• 2. X: subtract mean x from each row vector xn in X0
• 3. Σ XTX Gramian (scatter) matrix for X
• Find eigenvectors and eigenvalues of Σ
• PCs U (D × d )  the d eigenvectors with largest

eigenvalues
• PCA feature for y D-dim: UTy (d-dimensional)

• Zero correlations, ordered by variance
13/04/2018 Haiping Lu – University of Sheffield 10

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
-2

0

2

4

6

8

10

2D Data

13/04/2018 Haiping Lu – University of Sheffield 11

-5 -4 -3 -2 -1 0 1 2 3 4 5
-5

-4

-3

-2

-1

0

1

2

3

4

5

Principal Components

13/04/2018 Haiping Lu – University of Sheffield 12

1st principal vector

2nd principal vector

• The best axis to
project

• Minimum RMS
error

• Principal vectors
are orthogonal

How Many Components?

13/04/2018 Haiping Lu – University of Sheffield 13

• Check the distribution of eigen-values
• Take enough many eigen-vectors to cover 80-90% of the

variance

Other Practical Tips
• PCA assumptions (linearity, orthogonality) not

always appropriate
• Various extensions to PCA with different underlying

assumptions, e.g., manifold learning, Kernel PCA,
ICA

• Centring is crucial, i.e., we must preprocess data so
that all features have zero mean before applying
PCA

• PCA results dependent on scaling of data
• Data is sometimes rescaled in practice before

applying PCA
13/04/2018 Haiping Lu – University of Sheffield 14

Problems and Limitations

13/04/2018 Haiping Lu – University of Sheffield 15

• What if very large dimensional data?
• e.g., Images (D ≥ 104= 100×100)

• Problem:
• Gramian matrix Σ is size (D2)
• D=104  |Σ| = 108

• Singular Value Decomposition (SVD)!
• Efficient algorithms available
• Some implementations find just top d eigenvectors

Week 8 Contents

• Unsupervised Learning

• PCA – Dimensionality Reduction

• SVD – Factor Analysis

• Scalable PCA in Spark

13/04/2018 Haiping Lu – University of Sheffield 16

Singular Value Decomposition

13/04/2018 Haiping Lu – University of Sheffield 17

• Factorization (decomposition) problem
• #1: Find concepts/topics/genres  Factor Analysis
• #2: Reduce dimensionality

The above matrix is actually “2-dimensional.” All rows can be
reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]: D=5d=2

SVD – Definition

13/04/2018 Haiping Lu – University of Sheffield 18

A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

• A: n × m matrix (e.g., n documents, m terms)
• U: n × r matrix (n documents, r concepts)
• Λ: r × r diagonal matrix (strength of each

‘concept’) (r: rank of the matrix)
• V: m × r matrix (m terms, r concepts)

SVD – Properties

13/04/2018 Haiping Lu – University of Sheffield 19

Always possible to decompose matrix A into A = U Λ VT ,
where

• U, Λ, V: unique (*)
• U, V: column orthonormal (i.e., columns are unit vectors,

orthogonal to each other)
• UTU = I; VTV = I (I: identity matrix)

• Λ: singular value are positive, and sorted in decreasing
order

SVD Eigen-decomposition

13/04/2018 Haiping Lu – University of Sheffield 20

• SVD gives us:
• A = U Λ VT

• Eigen-decomposition:
• B = W Σ WT

• U, V, W are orthonormal (UTU=I),
• Λ, Σ are diagonal

• Relationship:
• AAT= U Λ VT(U Λ VT)T = U Λ VT(V Λ TUT) = U Λ Λ T UT
• ATA = V Λ T UT (U Λ VT) = V Λ Λ T VT = V Λ 2 VT
• B= ATA=W Σ WT

SVD for PCA

• PCA by SVD:
• 1. X0 Form N × d data matrix, with one row vector xn

per data point
• 2. X subtract mean x from each row vector xn in X0
• 3. U Λ VT SVD of X
• The right singular vectors V of X are equivalent to the

eigenvectors of XTX the PCs
• The singular values in Λ are equal to the square roots of

the eigenvalues of XTX

13/04/2018 Haiping Lu – University of Sheffield 21

SVD – Properties

13/04/2018 Haiping Lu – University of Sheffield 22

‘spectral decomposition’ of the matrix:

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

= x xu1 u2
λ1

λ2

v1
v2

SVD – Interpretation

13/04/2018 Haiping Lu – University of Sheffield 23

‘documents’, ‘terms’ and ‘concepts’:
• U: document-to-concept similarity matrix
• V: term-to-concept similarity matrix
• Λ: its diagonal elements: ‘strength’ of each concept

Projection:
• Best axis to project on: (‘best’ = min sum of

squares of projection errors)

SVD – Example

13/04/2018 Haiping Lu – University of Sheffield 24

• A = U Λ VT – example:

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

data
inf.
retrieval

brain lung

0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27

=
CS

MD

9.64 0
0 5.29

x

0.58 0.58 0.58 0 0
0 0 0 0.71 0.71

x

SVD – Example

13/04/2018 Haiping Lu – University of Sheffield 25

• A = U Λ VT – example:

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

data
inf.

retrieval
brain lung

0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27

=
CS

MD

9.64 0
0 5.29

x

0.58 0.58 0.58 0 0
0 0 0 0.71 0.71

x

CS-concept
MD-concept

doc-to-concept
similarity matrix

SVD – Example

13/04/2018 Haiping Lu – University of Sheffield 26

• A = U Λ VT – example:

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

data
inf.

retrieval
brain lung

0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27

=
CS

MD

9.64 0
0 5.29

x

0.58 0.58 0.58 0 0
0 0 0 0.71 0.71

x

‘strength’ of CS-concept

SVD – Example

13/04/2018 Haiping Lu – University of Sheffield 27

• A = U Λ VT – example:

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

data
inf.

retrieval
brain lung

0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27

=
CS

MD

9.64 0
0 5.29

x

0.58 0.58 0.58 0 0
0 0 0 0.71 0.71

x

term-to-concept
similarity matrix

CS-concept

SVD – Dimensionality Reduction

13/04/2018 Haiping Lu – University of Sheffield 28

• Q: how exactly is (further) dim. reduction done?
• A: set the smallest singular values to zero:
• Note: 3 zero singular values already removed

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27

=
9.64 0
0 5.29

x

0.58 0.58 0.58 0 0
0 0 0 0.71 0.71

x

SVD – Dimensionality Reduction

13/04/2018 Haiping Lu – University of Sheffield 29

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

0.18
0.36
0.18
0.90
0
0
0

~
9.64

x

0.58 0.58 0.58 0 0

x

SVD – Dimensionality Reduction

13/04/2018 Haiping Lu – University of Sheffield 30

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1

~

1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

• Best rank-1 approximation

Week 8 Contents

• Unsupervised Learning

• PCA – Dimensionality Reduction

• SVD – Factor Analysis

• Scalable PCA in Spark

13/04/2018 Haiping Lu – University of Sheffield 31

PCA & SVD in Spark MLlib
• Not scalable: computePrincipalComponents( )

from RowMatrix
• Scalable: computeSVD( ) from RowMatrix
• Code:

https://github.com/apache/spark/blob/v2.1.0/mllib/src/main/scala/org/apa
che/spark/mllib/linalg/distributed/RowMatrix.scala

• Documentation:
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.
apache.spark.mllib.linalg.distributed.RowMatrix

13/04/2018 Haiping Lu – University of Sheffield 32

https://github.com/apache/spark/blob/v2.1.0/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
https://spark.apache.org/docs/2.1.0/api/scala/index.html

PCA in Spark MLlib (RDD)

• https://spark.apache.org/docs/2.1.0/mllib-
dimensionality-reduction.html

• Not scalable, local computation

• Notebook 8
13/04/2018 Haiping Lu – University of Sheffield 33

https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html

PCA in Spark ML (DF)

• Now in
https://spark.apache.org/docs/2.1.0/ml-features.html#pca

• Under features
• Scalable? Not likely

13/04/2018 Haiping Lu – University of Sheffield 34

https://spark.apache.org/docs/2.1.0/ml-features.html

SVD in Spark MLlib (RDD)

• https://spark.apache.org/docs/2.1.0/mllib-
dimensionality-reduction.html

• With distributed implementations

13/04/2018 Haiping Lu – University of Sheffield 35

https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html

SVD in Spark MLlib (RDD)

• An m × n data matrix A with m>n (note different
notations)

• For large matrices, usually we don’t need the
complete factorization but only the top k singular
values and its associated singular vectors.

• Save storage, de-noise and recover the low-rank
structure of the matrix (dimensionality reduction)

13/04/2018 Haiping Lu – University of Sheffield 36

SVD in Spark MLlib (RDD)

• An m × n data matrix A
• Assume m>n, SVD A = U Λ VT

• The singular values and the right singular vectors
are derived from the eigenvalues and the
eigenvectors of ATA (which is smaller than A)

• The left singular vectors are computed via matrix
multiplication as U=AV Λ −1, if requested by the
user via the computeU parameter

13/04/2018 Haiping Lu – University of Sheffield 37

Selection of SVD Computation

• Auto
• If n is small (n<100) or k is large compared with n (k>n/2), compute ATA first and then compute its top
eigenvalues and eigenvectors locally on the driver

• Otherwise, compute ATA v in a distributive way and
send it to ARPACK to compute the top eigenvalues
and eigenvectors on the driver node

13/04/2018 Haiping Lu – University of Sheffield 38

Selection of SVD Computation

• Auto (default)

13/04/2018 Haiping Lu – University of Sheffield 39

Selection of SVD Computation

• Specify computeMode (private)

13/04/2018 Haiping Lu – University of Sheffield 40

Selection of SVD Computation

• computeMode (note brzSvd.SVD is local)

13/04/2018 Haiping Lu – University of Sheffield 41

Remark

• Acknowledgement
• Some slides are adapted from slides by Jure Leskovec et

al. http://www.mmds.org
• References

• http://infolab.stanford.edu/~ullman/mmds/ch11.pdf
• http://www.mmds.org
• https://en.wikipedia.org/wiki/Principal_component_analy

sis
• https://spark.apache.org/docs/2.1.0/mllib-dimensionality-

reduction.html

13/04/2018 Haiping Lu – University of Sheffield 42

http://www.mmds.org/
http://infolab.stanford.edu/%7Eullman/mmds/ch11.pdf
http://www.mmds.org/
https://en.wikipedia.org/wiki/Principal_component_analysis
https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html

Lecture 8 ��Scalable PCA/SVD�Dimensionality Reduction & Factor Analysis
Week 8 Contents
Unsupervised Learning
Three Topics
Week 8 Contents
Dimensionality Reduction
Why Reduce Dimensions?
Dimensionality Reduction
Dimensionality Reduction
PCA Algorithm
2D Data
Principal Components
How Many Components?�
Other Practical Tips
Problems and Limitations
Week 8 Contents
Singular Value Decomposition
SVD – Definition
SVD – Properties
SVD Eigen-decomposition
SVD for PCA
SVD – Properties
SVD – Interpretation
SVD – Example
SVD – Example
SVD – Example
SVD – Example
SVD – Dimensionality Reduction
SVD – Dimensionality Reduction
SVD – Dimensionality Reduction
Week 8 Contents
PCA & SVD in Spark MLlib
PCA in Spark MLlib (RDD)
PCA in Spark ML (DF)
SVD in Spark MLlib (RDD)
SVD in Spark MLlib (RDD)
SVD in Spark MLlib (RDD)
Selection of SVD Computation
Selection of SVD Computation
Selection of SVD Computation
Selection of SVD Computation
Remark

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

9.64
0

0
5.29

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

9.64
0

0
5.29

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

9.64
0

0
5.29

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

9.64
0

0
5.29

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71

1
1
1
0
0

2
2
2
0
0

1
1
1
0
0

5
5
5
0
0

0
0
0
2
2

0
0
0
3
3

0
0
0
1
1

0.18
0

0.36
0

0.18
0

0.90
0

0
0.53

0
0.80

0
0.27

9.64
0

0
5.29

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71

0.58
0.58
0.58
0
0

0
0
0
0.71
0.71