程序代写代做代考 data mining python algorithm PowerPoint Presentation

PowerPoint Presentation

CS373 Data Mining and�
Machine Learning�

Lecture 12
Jean Honorio

Purdue University

Except for the first two and last two slides, the presentation was taken from
http://www.cs.cmu.edu/~bapoczos/Classes/ML10715_2015Fall/

•  Symmetric matrix
• Orthonormal matrix

• Diagonal matrix of �
eigenvalues

•  Eigenvectors, columns of
•  Python:

Linear Algebra: Eigen Decomposition

import numpy as np
import numpy.linalg as la

Sigma = np.array([[ 5.2, 3.3, -2],
[ 3.3, 8.1, 4],
[ -2, 4, 6.5]])
lam, U = la.eig(Sigma)

>>> lam
array([ 0.38, 7.7 , 11.71])
>>> U
array([[-0.61, 0.75, 0.25],
[ 0.55, 0.18, 0.81],
[-0.56, -0.64, 0.53]])

>>> np.dot(U.T,U)
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])

>>> np.dot(U,np.dot(np.diag(lam),U.T))
array([[ 5.2, 3.3, -2. ],
[ 3.3, 8.1, 4. ],
[-2. , 4. , 6.5]])

Σ =UΛUT

UTU = I
Λ

U

• Matrix
• Orthonormal matrices
,

• Diagonal matrix of �
singular values

•  Python:

Linear Algebra: Singular Value Decomposition

import numpy as np
import numpy.linalg as la

X = np.array([[ 0.6, -0.7],
[ 2.5, 1.9],
[ -1.6, -0.9],
[ -2.8, 0.8]])
U, s, Vt = la.svd(X,False)

>>> U
array([[-0.09, 0.39],
[-0.69, -0.54],
[ 0.42, 0.2 ],
[ 0.58, -0.72]])
>>> s
array([ 4.24, 2.13])
>>> Vt
array([[-0.96, -0.27],
[ 0.27, -0.96]])

>>> np.dot(Vt,Vt.T)
array([[ 1., 0.],
[ 0., 1.]])

# U*Diag(s)*Vt
>>> np.dot(U,np.dot(np.diag(s),Vt))
array([[ 0.6, -0.7],
[ 2.5, 1.9],
[-1.6, -0.9],
[-2.8, 0.8]])

X =USV T

S
UTU = I V TV = I

3

Motivation

4

• Data Visualization
• Data Compression
• Noise Reduction

PCA Applications

5

Data Visualization
Example:

• Given 53 blood and urine samples

(features) from 65 people.

• How can we visualize the measurements?

6

• Matrix format (65×53)
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHCH-MCHC

A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

In
st

an
ce

s

Features

Difficult to see the correlations between the features…

Data Visualization

7

• Spectral format (65 curves, one for each person)

0 10 20 30 40 50 600
100
200
300
400
500
600
700
800
900

1000

measurement

Va
lu

e

Measurement
Difficult to compare the different patients…

Data Visualization

9

0 50 150 250 350 450
50
100
150
200
250
300
350
400
450
500
550

C-Triglycerides

C
-L

D
H

0 100
200300

400500

0
200

400
600
0

1

2

3

4

C-Triglycerides
C-LDH

M
-E

PI

Bi-variate Tri-variate

How can we visualize the other variables???

…  difficult  to  see  in  4  or  higher  dimensional  spaces…

Data Visualization

10

• Is there a representation better than the coordinate axes?

• Is it really necessary to show all the 53 dimensions?
• …  what  if  there  are  strong  correlations  between  the  

features?

• How could we find
the smallest subspace of the 53-D space that
keeps the most information about the original data?

• A solution: Principal Component Analysis

Data Visualization

11

PCA Algorithms

12

Orthogonal projection of the data onto a lower-dimension linear
space that…
�maximizes variance of projected data (purple line)

�minimizes the mean squared distance between

• data point and
• projections (sum of blue lines)

PCA:

Principal Component Analysis

13

Idea:
� Given data points in a d-dimensional space,

project them into a lower dimensional space while
preserving as much information as possible.
• Find best planar approximation of 3D data
• Find best 12-D approximation of 104-D data

� In particular, choose projection that

minimizes squared error
in reconstructing the original data.

Principal Component Analysis

14

�PCA Vectors originate from the center of mass.

�Principal component #1: points in the direction of
the largest variance.

�Each subsequent principal component
• is orthogonal to the previous ones, and
• points in the directions of the largest

variance of the residual subspace

Principal Component Analysis

Properties:

15

2D Gaussian dataset

16

1st PCA axis

17

2nd PCA axis

18

¦

m

i
i

T
i

T

m 1
2

11
1

2 })]({[
1

maxarg xwwxww
w

}){(
1

maxarg
1

2
i

1
1 ¦

m

i

T

m
xww

w

To find w2, we maximize
the variance of the
projection in the residual
subspace

To find w1, maximize the variance of projection of x

x’  PCA reconstruction

Given the centered data {x1,  …,  xm}, compute the principal vectors:

1st PCA vector

2nd PCA vector

x

w1

w
x’=w1(w1Tx)

w

PCA algorithm I (sequential)

x-x’ w2

19

¦ ¦


m

i

k

j
i

T
jji

T
k m 1

2
1

1
1

})]({[
1

maxarg xwwxww
w

We maximize the variance
of the projection in the
residual subspace

Maximize the variance of projection of x

x’  PCA reconstruction

Given w1,…,  wk-1, we calculate wk principal vector as before:

kth PCA vector

w1(w1Tx)

w2(w2Tx)

x

w1

w2
x’=w1(w1Tx)+w2(w2Tx)

w

PCA algorithm I (sequential)

20

• Given data {x1,  …,  xm}, compute covariance matrix 6

• PCA basis vectors = the eigenvectors of 6�

• Larger eigenvalue � more important eigenvectors

¦

�� 6
m

i

T
im 1

))((
1

xxxx ¦

m

i
im 1

1
xxwhere

PCA algorithm II
(sample covariance matrix)

i

21

PCA algorithm(X, k): top k eigenvalues/eigenvectors
% X = N u m data matrix,

%  …  each  data point xi = column vector, i=1..m

• X Å subtract mean x from each column vector xi in X

• 6 Å X XT …  covariance matrix of X
• { Oi, ui }i=1..N = eigenvectors/eigenvalues of 6�

�����O1 t O2 t …  t ON�

• Return { Oi, ui }i=1..k
% top k PCA components

¦

m

im 1
1

ixx

PCA algorithm II
(sample covariance matrix)

24

Singular Value Decomposition of the centered data matrix X.

Xfeatures u samples = USVT

X VT S U =

samples

significant

noise

no
is

e noise

si
gn

if
ic

an
t

sig.

PCA algorithm III
(SVD of the data matrix)

25

• Columns of U
• the principal vectors, { u(1),  …,  u(k) }
• orthogonal and has unit norm – so UTU = I
• Can reconstruct the data using linear combinations

of { u(1),  …,  u(k) }

• Matrix S
• Diagonal
• Shows importance of each eigenvector

• Columns of VT

• The coefficients for reconstructing the samples

PCA algorithm III

26

Applications

�Want to identify specific person, based on facial image
� Robust  to  glasses,  lighting,…
� Can’t  just  use  the  given  256  x  256  pixels

Face Recognition

27

� Example data set: Images of faces
• Eigenface approach

[Turk & Pentland], [Sirovich & Kirby]

� Each face x is  …

• 256 u 256 values (luminance at location)
• x in �256u256 (view as 64K dim vector)

� Form X = [ x1 ,  …,  xm ] centered data
mtx

� Compute 6�= XXT

� Problem: 6 is 64K u 64K  …  HUGE!!!

Applying PCA: Eigenfaces

29

256 x 256
real values

m faces

X =

x1,  …,  xm

34

Happiness subspace

35

Disgust subspace

37

Facial Expression Recognition
Movies

Image Compression

40

� Divide the original 372×492 image into patches:

• Each patch is an instance that contains 12×12 pixels on a grid

� Consider each as a 144-D vector

Original Image

41

L2 error and PCA dim

42

PCA compression: 144D ) 60D

50

Looks like the discrete cosine bases of JPG!…

60 most important eigenvectors

PCA Shortcomings

57
PCA  doesn’t  know  about class labels!

Problematic Data Set for PCA

58

• PCA maximizes variance,
independent of class
� magenta

• FLD attempts to separate classes
� green line

PCA vs Fisher Linear Discriminant

A linear classifier attempts to separate classes

PCA versus linear classifiers

•  Incorrect way: DO NOT do feature selection (or
dimensionality reduction) on the whole dataset, and then
cross-validation”

•  Feature selection and dimensionality reduction on the whole
dataset destroys cross-validation”

•  reduced training set would depend on the validation set”
•  Thus, training is looking at the supposedly “unseen” data!

Feature Selection and Cross-Validation”

Training
set”

Validation
set”

Reduced
training

set”

Reduced
validation

set”
Feature selection / dimensionality reduction”

Training
set”

Validation
set”

• Correct way: feature selection (or dimensionality reduction)
inside cross-validation, only applied to the training set”

Feature sel. / dim. reduc.”

Feature Selection and Cross-Validation”