Data Mining and Machine Learning
Introduction to Data Mining, Vector Data Analysis and Principal Components Analysis (PCA)
Slide 1
Data Mining and Machine Learning
Objectives
To introduce Data Mining
To outline the techniques that we will study in this part of the course – a Data Mining ‘Toolkit’
To review basic data analysis and to review the notions of mean, variance and covariance
To explain Principal Components Analysis (PCA)
To present an example of PCA
Slide 2
Data Mining and Machine Learning
What is Data Mining? Mining
– Digging deep into the earth, to find hidden, valuable materials
Data Mining
– Analysis of large data corpora: biomedical, acoustic, video, text,… to discover structure, patterns and relationships
– Corpora which are too large for human inspection – Patterns and structure may be hidden
Slide 3
Data Mining and Machine Learning
Data Mining
Structure and patterns in large, abstract data sets:
– Is the data homogeneous or does it consist of several separately identifiable subsets?
– Are there patterns in the data?
– If so, do these patterns have an intuitive interpretation?
– Are there correlations in the data?
– Is there redundancy in the data?
Slide 4
Data Mining and Machine Learning
Data Mining
In this part of the course we will develop a basic ‘data mining toolkit’
– Subspace projection methods (PCA) – Clustering
– Statistical modelling
– Sequence analysis
– Dynamic Programming (DP)
Slide 5
Data Mining and Machine Learning
Some example data
Fig 1: Single, spherical cluster centred at origin
Fig 2: Single, arbitrary elliptical cluster
Fig 3: Multiple, arbitrary elliptical clusters
Slide 6
Data Mining and Machine Learning
Objectives
Fig 3 shows “multiple source” data. The data is arranged in a set of “clusters”.
How do we discover the number and locations of the clusters?
Remember, in real applications there will be many points in a high- dimensional vector space which is difficult to visualise
Slide 7
Data Mining and Machine Learning
Objectives
Fig 1 shows simplest type of data – single source data centred at origin. Equal variance in both dimensions and no covariance.
Fig 2 is again single source, but the data is correlated and skewed and not centred at the origin.
How do we convert Fig 2 into Fig 1?
We will start with this problem
Solution is a technique called Principal Components Analysis (PCA)
Fig 1
Fig 2
Slide 8
Data Mining and Machine Learning
Example from speech processing
14 13 12 11 10
9 8 7 6
Plot of high-frequency energy vs low- frequency energy, for 25 ms speech segments, sampled every 10ms
6 7 8 9 10 11 12 13 14
Slide 9
Data Mining and Machine Learning
Basic statistics
Sample variance ‘x’
‘y’ max
Sample mean
14 13 12 11 10
9 8 7 6
‘x’ min
Sample variance ‘y’
‘y’ min
6 7 8 9 10 11 12 13 14
Slide 10
‘x’ max Data Mining and Machine Learning
Basic statistics
Denote samples by X = x1, x2, … ,xT
where xt = (xt1, xt2, … , xtN)
The sample mean (or more correctly (X)) vector
is given by:
n 1T n T xt
t1
1 , 2 , . . . , n , . . . , N
Slide 11
Data Mining and Machine Learning
More basic statistics
The sample variance (more correctly (X)) vector is given by:
Slide 12
Data Mining and Machine Learning
Covariance
In this data, as the x value increases, the y value also increases
14 13
12 11 10
9 8 7 6
This is (positive) co-variance
If y decreases as x increases, the result is negative covariance
6 7 8 9 10 11 12 13 14
Slide 13
Data Mining and Machine Learning
Definition of covariance
The covariance between the mth and nth components of the sample data is defined by:
In practice it is useful to subtract the mean from each of the data points xt. The sample mean is then 0 and
Slide 14
Data Mining and Machine Learning
The covariance matrix
1,1 1,2 2,1 2,2
… …
m,1 …
… … N,1
… 1,n … … 2,n … … … … … m,n … … … …
1,N 2,N
… m,N
…………
… N,N
Slide 15
Data Mining and Machine Learning
Data with mean subtracted
3
2
1
0
-5 -4 -3 -2 -1 0 1 2 3 4
2.96
Implies positive covariance
1.9
-1 -2 -3 -4
1.97
Slide 16
Data Mining and Machine Learning
1 . 9
Sample data rotated
3 2 1 0
-4 -3 -2 -1 0 1 2 3 4 5 -1
-2 -3 -4
2.96
1.9 1.97
Implies negative covariance
Slide 17
Data Mining and Machine Learning
1.9
Data with covariance removed
6 5 4 3 2 1 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 -1
-2 -3 -4 -5
4.51 0
0 0.48
Slide 18
Data Mining and Machine Learning
Principal Components Analysis
PCA is the technique which I used to diagonalise the sample covariance matrix
The first step is to write the covariance matrix in the
form:
UDUT
where D is diagonal and U is a matrix corresponding
to a rotation
You can do this using SVD (see lecture on LSI) or Eigenvalue Decomposition
Slide 19
Data Mining and Machine Learning
PCA continued
e1 e
1
0
-5 -4 -3 -2 -1 0 1 2 3 4 directione1
3
2
e1 is the first column of U u
U implements rotation through angle
2
e 11 1 u
21
d11 is the variance in the
-1
-2 e2 is the 2nd column of U
-3 d22 is the variance in the direction e2
-4
u ud 0u u UDUT 11 12 11 11 21
uu0duu
21 22 22 12 22
Slide 20
Data Mining and Machine Learning
PCA Example Abstract data set
Slide 21
Data Mining and Machine Learning
PCA Example (continued) Step 1: load the data into MATLAB:
– A=load(‘data4’);
Step 2: Calculate the mean and subtract this from each sample
– M=ones(size(A));
– N=mean(A);
– M(:,1)=M(:,1)*N(1); – M(:,2)=M(:,2)*N(2); – B=A-M;
Plot B Slide 22
Data Mining and Machine Learning
PCA Example (continued)
Slide 23
Data Mining and Machine Learning
PCA Example (continued)
Calculate the covariance matrix of B (or A)
– S=(B’*B)/size(B,1); – or
– S=cov(B); S 6.78 3.27
3.27 2.76
Difficult to deduce much about the data from this covariance matrix
Slide 24
Data Mining and Machine Learning
PCA Example (continued)
Calculate the eigenvalue decomposition of S
– [U,E]=eig(S);
U 0.4884 0.8726, E 0.9307 0
0.8726 0.4884 0
After transforming the data using U its covariance matrix becomes E. You can confirm this by plotting the transformed data:
8.6079
Slide 25
Data Mining and Machine Learning
PCA Example (continued)
Slide 26
Data Mining and Machine Learning
PCA Example (continued)
After transformation by the matrix U, the covariance matrix
has been diagonalized and is now equal to E – variance in the x direction is 0.93
– variance in the y direction is 8.61
This tells us that most of the variation in the data is contained in the (new) y direction
There is much less variation in the new x direction, and we could get a 1 dimensional approximation to the data by discarding this dimension
None of this is obvious from the original covariance matrix
Slide 27
Data Mining and Machine Learning
Final notes
Each column of U is a principal vector
The corresponding eigenvalue indicates the variance of the data along that dimension
– Large eigenvalues indicate significant components of the data
– Small eigenvalues indicate that the variation along the corresponding eigenvectors may be noise
It may be advantageous to ignore dimensions which correspond to small eigenvalues and only consider the projection of the data onto the most significant eigenvectors – this way the dimension of the data can be reduced
Slide 28
Data Mining and Machine Learning
Eigenvalues
Data PCs
1st 1st 2nd
3rd
Eigenvalues
More Significant Components
Insignificant Components
90th 90th
Slide 29
Data Mining and Machine Learning
Visualising PCA
Original pattern (blue)
Reduced pattern (red)
U
Slide 30
Data Mining and Machine Learning
U-1
Eigenspace
Set coordinates n → 90 to zero
Eigenspace
Summary
Review of basic data analysis (mean, variance and covariance)
Introduction to Principal Components Analysis (PCA)
Example of PCA
Slide 31
Data Mining and Machine Learning