CS代考 MAST 90138: MULTIVARIATE STATISTICAL TECHNIQUES

MAST 90138: MULTIVARIATE STATISTICAL TECHNIQUES
See Ha ̈rdle and Simar, chapter 11.
5 PRINCIPAL COMPONENT ANALYSIS
5.1 INTRODUCTION
Visualizing 1, 2 or 3 dimensional data is relatively easy: use scatterplot.
***
*
*
*
** * **
* ***
* **** *
* *** ** ** * * *
*
* ***
* ** * ** * *
***
****** * ***
**** *** **
** **
* *** ***
** * *
*
***
* *** ***** *
*
−5 0 5
X1
Lecture notes originally by Prof. 1
0246
X2

When the data are in higher dimension, it is very difficult to visualize them.
 Can we find a way to summarise the data?
 Summaries should be easier to represent graphically.
 Summaries should still contain as much information as possible about the original data.
 Often we can achieve this through dimension reduction.
Lecture notes originally by Prof. 2

Toy example: how to reduce to 1 dimension the following 2-dimensional data.
Data:acollectionofi.i.d.pairs(Xi1,Xi2)T ∼(μ,Σ),fori=1,…,n,shown in the scatter plot.
***
*
*
*
*
* ***
**** **
*
***
* ** * **
** ** *
*** * * *** **
*
** * * * *
** **
*** **
** * *
***
*
* *** ***** *
*
**
****** * ***
** * *
****
*
−5 0 5
X1
Lecture notes originally by Prof. 3
0246
X2

The first thing usually done in these problems is to center the data (easier to understand the geometry for centered data). For for i = 1, . . . , n, we replace (Xi1, Xi2)T by (Xi1 − X ̄1, Xi2 − X ̄2)T :
***
*
*
*
*
* ***
**** **
*
***
* ** * **
** ** *
*** * * *** **
*
** * * * *
** **
*** **
** * *
***
*
* *** ***** *
*
**
****** * ***
** * *
****
*
−5 0 5
X1
Lecture notes originally by Prof. 4
X2
−2 0 2 4

Unless otherwise specified, for the rest of this chapter, to avoid heavy no- tation when we refer to Xij we mean Xij − X ̄j.
To reduce these data to a single dimension we could for example keep only the first component Xi1 of each data point.
*
* *** ***** *
*
**
*
****** * ***
** * *
****
* ** * **
** ** *
***
* ***** ** ************************ ***** * * * *
***
*
*
*
*
* ***
**** **
*
***
* **** ** * * * *
** **
*** **
** * *
***
−5 0 5
X1
Lecture notes originally by Prof. 5
X2
−2 0 2 4

Keeping only the first component
* ** ** * * *** * *** **** ******* ***** ** * * *** * * *** ** ** * * * * * *
−4 −2 0 2 4 6
X1
Lecture notes originally by Prof. 6

 Not very interesting: lose all the information about the second com- ponent X2.
 Suppose the data contain the age (X1) and the height (X2) of n = 100 individuals. This amounts to keeping only the age and drop com- pletely the data about height.
Why not instead create a new variable that contains information about both age and height?
Lecture notes originally by Prof. 7

Simple approach: take a linear combination of the age and the height.
 For i = 1,…,n we could create a new variable Yi = agei/2 + heighti/2,
i.e. the average of the age and the height. 1/2 and 1/2 are the weights of age and height, respectively.
 We often prefer to rescale linear combinations so that the sum of the square of the weights equals 1, for example,
√√
Yi = agei/ 2 + heighti/ 2.
Lecture notes originally by Prof. 8

The values
√√
Yi =Xi1/ 2+Xi2/ 2:
* ***** ** ** * * **************** *************** ** ** ** *
−6 −4 −2 0 2 4 6
(X1+X2)/sqrt 2
Lecture notes originally by Prof. 9

Taking a scaled average of the two components = projecting the data onto the 45 degree line (red) and keeping only the projected values.
*** * *
* * *** **
* **
*
* **
* *
* * * * * ** * * ** *
** ** ** ** * * **
* * * **** *
** * **
* *** * * *
* * * ** * **
* * * **
* **** * *
* * *
* * * * * * *
*
*
* *
*** * ** *
** * ** * *
**
* * * *
* *
** *
*
*
*
*** *
* ** *
−10 −5 0 5 10
X1
How was this figure constructed?
Lecture notes originally by Prof. 10
X2
−6 −4 −2 0 2 4 6

 Recap: the projection px of a vector x, onto a vector y, is the vector px = xTyy,.
f 􏰴 equals the length of the base of a triangle (jjp jj) divided
potenuse (jjxjj). Hence, we havepx = jx>yj
xTy x ∥y∥
􏰣􏰢􏰡􏰤
y ∥y∥
􏰫
 Another more transparent way of viewing this: 0, then the angle 􏰴 is equal to 2 . From trigonometry, we
∥y∥2
73
the projection jjpxjj D jjxjjjcos􏰴j D kyk ; (“projected value”) (2.42)
tion of x on y (which is defined below). It is the coordinate e Fig. 2.5.
length of the projection
􏰣􏰢􏰡􏰤
unit vector
in the direction of
e defined with respect to a general metric A
Lecture notes originally by Prof. 11
D
y
c e
b

√√
 The linear combination Yi = Xi1/ 2 + Xi2/ 2 is the same as Y i = X iT a
where
Xi=(Xi1,Xi2)T, a=(1/√2,1/√2)T.
 So Yi is the projection value (length of the projection vector) of Xi
onto the 45 degree line passing through the origin.
Lecture notes originally by Prof. 12

The the line passing through the origin and a = (1/√2, 1/√2)T is shown in red. The projection of each Xi on that line is shown in blue.
*** * *
* * *** **
* **
*
* **
* *
*** * * ** *
* * ** ** ** *
* *** **** *
** * **
* *** * * * * *
* ** * **
* * * **
* **** * *
* * *
* * * * * * *
*
*
* *
*** * *****
*** ** * *
** ****
*
* *
*
*
** *
*** *
* ** *
−10 −5 0 5 10
X1
Lecture notes originally by Prof. 13
X2
−6 −4 −2 0 2 4 6

* * *
* * ***
* * ** ** * * * * * ***
* * *** **
* ** * *
** * *****
** * * * * * ** *
**** ** * * * ***
* ** * **
* * *
* *
** *
*
*
*
*** *
* ** * *
** * * *
*
* * *** *
***** ** *
* * ** * * * * * * * ** * *
−10 −5 0 5 10
X1
* ***** ** ** * * **************** *************** ** ** ** *
−6 −4 −2 0 2 4 6
(X1+X2)/sqrt 2
Lecture notes originally by Prof. 14
X2
−6 −4 −2 0 2 4 6

 Instead of giving equal weight to each component of Xi, when re- ducing dimension we would like to lose as little information about the original data as possible.
 How we define “lose information” ?
 In principal component analysis (PCA), we reduce dimension by
projecting the data onto lines.
 Moreover, in PCA, “lose as little information as possible” is defined
as “keep as much of the variability of the original data as possible”.  In our two dimensional case example, when choosing the projection
Yi = XiTa on a line, this means we want to find a such that var(Yi)
is as large as possible.
Lecture notes originally by Prof. 15

Why do we want to maximise variance? Here is an example where the projected data are not variable: project the data on the red line
● ● ● ●
● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ●
● ●
● ● ● ● ● ●
● ●
● ●
● ● ● ●
● ●
● ● ●
−100 −50 0 50 100
X1
Lecture notes originally by Prof.
16
X2
0 20 40 60 80 100

Projected data land on the same point and have zero variance; don’t learn anything about the data.
*
−1.0 −0.5 0.0 0.5 1.0
univariate projection
Lecture notes originally by Prof.
17

Recall in the example, we’ve projected onto the 45-degree line through the origin:
*** * *
* * *** **
* **
*
* **
* *
* * * * * ** * * ** *
** ** ** ** * * **
* * * **** *
** * **
* *** * * *
* * * ** * **
* * * **
* **** * *
* * *
* * * * * * *
*
*
* *
*** * ** *
** * ** * *
**
* * * *
* *
** *
*
*
*
*** *
* ** *
−10 −5 0 5 10
X1
Lecture notes originally by Prof. 18
X2
−6 −4 −2 0 2 4 6

However we would have kept more information if we had instead pro- jected the data on the following line:
***
*
*
*
*
* ***
**** **
*
***
* ** * **
** ** *
*** * * *** **
*
** * * * *
** **
*** **
** * *
***
*
* *** ***** *
*
**
****** * ***
** * *
****
*
−5 0 5
X1
Lecture notes originally by Prof. 19
X2
−2 0 2 4

*
* *
* * ***
*** **
* **
* * ***
* * * **
** ** *
** * * *
* ** * ** *
* * * * * * * *
*** * * * **
** ***
* * * *
*** **
* * *
** *
** * ** * *
* *** ** * * * *
**
* ****** *
* *** *
** **
** *
* **** *
**
*
**
**
*
* **
−5 0 5
X1
Indeed, on this line, the projected data are more variable than on the pre- vious line.
Lecture notes originally by Prof. 20
X2
−2 0 2 4

The scaled average (in red) is less variable than the last suggested pro- jected values:
* var max
* scaled ave
* *** ** ** **** *************** ********* *** *** ** *
* ***** ** ** * * **************** ************** ** ** ** *
−6 −4 −2 0 2 4 6
univariate projection
The projections in blue is in fact the one that maximises the variance of the projected values of the data.
Lecture notes originally by Prof. 21

5.2 PCA
Formally, in PCA, when reducing the p-variate Xi’s to univariate Yi1’s, for i = 1,…,n, where the Xi’s are i.i.d.∼ (0,Σ), the goal is to find the linear combination
Yi1 =a1Xi1 +…+apXip =XiTa, where a = (a1,…,ap)T is such that
and
is as large as possible.
p
∥ a ∥ 2 = 􏰋 a 2j = 1
j=1
var(Yi1)
 We use Yi1 instead of Yi because there will be more than one projec- tion.
 The constraint on a is a scaling factor that makes things easier.
Lecture notes originally by Prof. 22

 Let γ1, . . . , γp denote the p unit-length eigenvectors (i.e., ∥γj ∥ = 1) of the covariance matrix Σ, respectively associated with the eigenvalues
λ1≥λ2≥ . . . ≥λp.
 Recall: γj’s are only defined up to a change of sign, so each γj can
be replaced by −γj.
 It can be shown that the a that maximises the variance is equal to
γ1,
the eigenvector with largest eigenvalue (“first eigenvector”) .
 The variable
is called the first principal component of Xi (or “PC1” for short).
Yi1 =a1Xi1 +…+apXip =aTXi =γ1TXi
Lecture notes originally by Prof. 23

 More generally, if the data are i.i.d.∼ (μ, Σ) and not already cen- tered,
Yi1 =γ1T{Xi −E(Xi)}=γ1T(Xi −μ) is called the first principal component of Xi.
 It is the linear projection of the data that has maximum variance.  We always center the data before projecting.
Lecture notes originally by Prof. 24

 In PCA, once we have found a univariate projection, how do we add a second projection?
 One possibility: on the good old 45-degree line (blue below)
***
*
*
*
** **
* ***
**** **
*
* * **
* *****
** * * **
** *** **
** * *
***
*
* *** ***** *
*
**
****** * ***
** * *
****
* ** * **
** **
* ***
*
−5 0 5
X1
Lecture notes originally by Prof. 25
X2
−2 0 2 4

Those two projections are essentially redundant, we don’t learn much more:
*
* ** *
** **
* ** *
* * *
* *
** *
** *
* *
* * *
* **
**
** *
* **
* **
*
**
** *
* *** **
−10 −5 0 5 10
PC1
Lecture notes originally by Prof. 26
proj 2
−6 −4 −2 0 2 4 6

 We should project onto a line as different as possible, to learn com- plementary information. How?
 Project onto a perpendicular direction to that of the PC1. The vari- able obtained is called the second principal component (“PC2” for short).
*
* *
* **
* * ***
* * * **
* * ***
*** **
** ** *
* * ****
* **
*
*** * *
* * * * ** * ***
* *
* *
* * * *
*** * * *
* * ** *
* ** * ** *
* ** ** * ** *
* * ** * * *
* ** * * *** * * * * *
* * * * *
* *** ** * * * *
**
* ****** *
* *** *
** **
** *
**
*
**
**
*
* **
−5 0 5
X1
Lecture notes originally by Prof. 27
X2
−2 0 2 4

The data projected on the two lines are just the same as the original data, but where the axes have been rotated to match the blue and the red lines.
*
**** *****
* ******** ** * *** * *******
*
*
*******
* * * ** * *** * * *** *
**** ** *** *
** *** ** *
* ** * ** *
* * ******* *** * **
−6 −4 −2 0 2 4 6
PC1
Lecture notes originally by Prof. 28
PC2
−3 −2 −1 0 1 2 3

More generally, when we transform p-dimensional data into q ≤ p-dimensional Xi’s that are ∼ (μ, Σ), with the γj ’s and λj ’s as defined at page 23,
 We start by taking the first principal component of Xi Yi1 =γ1T{Xi −E(Xi)}=γ1T(Xi −μ)
where γ1 the eigenvec of Σ corresponding to the largest eigenval, λ1.
 Then for k = 2,…,q, we take the kth principal component of Xi
Yik =γkT{Xi −E(Xi)}=γkT(Xi −μ) (1)
where γk is the evec of Σ corresponding to the kth largest eval, λk.
 γj’s are orthonormal ⇒ the projection directions are orthogonal to each other.
Lecture notes originally by Prof. 29

 In matrix notation, letting Yi = (Yi1,…,Yip)T and Γ = [γ1|…|γp], we have
Yi = ΓT (Xi − μ).
 Suppose we construct Yi1, . . . , Yip as described above. Then we have
E(Yij) = 0, for j = 1,…,p
var(Yij) = λj, for j = 1,…,p
cov(Yik,Yij)=0, k̸=j
var(Yi1) ≥ var(Yi2) ≥ . . . ≥ var(Yip) p
􏰋 var(Yij) = tr(Σ) j=1
p
􏱅 var(Yij) = |Σ|.
j=1
Lecture notes originally by Prof. 30

 It can be proved that:
– it is not possible to construct a linear combination
Vi = XiT a where ∥a∥ = 1 which has larger variance than λ1 = var(Yi1).
– if we take a variable
Vi = XiT a where ∥a∥ = 1
which is not correlated with the first k PCs of Xi, then the variance
of Vi is maximised by taking Vi = Yi,k+1, the (k + 1)-th PC of Xi.
 With all these properties, the hope is that we can gather as much in- formation as possible about the original data by projecting them onto the first few PCs. (q much less than p if p is large)
Lecture notes originally by Prof. 31

5.3 IN PRACTICE
In practice: we do not know Σ nor μ = E(Xi). Instead we use their
empirical counterparts S and X ̄ , i.e.:
 We start by taking the first principal component of Xi
Y i 1 = γ 1T ( X i − X ̄ )
where γ1 is the eigenvec of S corresponding to the largest eigen-
value, λ1.
 Then for k = 2,…,q, we take the kth principal component of Xi
Yik = γkT (Xi − X ̄ )
where γk is the eigenvector of S corresponding to the kth largest eigenvalue, λk.
Lecture notes originally by Prof. 32

 In matrix notation, letting
Yi =(Yi1,…,Yip)T [ap-vector]
and
we have
Y = (Y1,…,Yn)T, [an n-by-p matrix] Y = ( X − 1 n X ̄ T ) Γ
for Γ = [γ1|…|γp].
 Once we have computed the PC’s we can:
∗ plot them to see if we can detect clusters
∗ see influential observations (outliers)
∗ see if we can get any insight about the data.
 When we detect something in the PC plots, we can:
∗ go back to the original data and try to make the connection,
∗ and check if our interpretation seems correct.
Lecture notes originally by Prof. 33

 Example: Swiss bank notes data.
 Data: variables measured on 200 Swiss 1000-franc banknotes, of
which 100 were genuine and 100 were counterfeit.
(Source: Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall, Tables 1.1 and 1.2, pp. 5–8.)
 Found in the R package mclust by typing data(banknote). The variables measured are:
X1: Length of bill (mm)
X2: Width of left edge (mm) X3: Width of right edge (mm) X4: Bottom margin width (mm) X5: Top margin width (mm) X6: Length of diagonal (mm)
 The first 100 banknotes are genuine and the next 100 are counterfeit. Lecture notes originally by Prof. 34

Scatterplots:
Length
● ●
● ●●●●● ●● ● ●●●●●● ●

● ●● ● ● ●● ● ●●●● ●●● ●●●●●●●●● ●●● ●●● ●
● ●●●●● ●●●● ●●●
●●
● ● ●●●
● ●●●●●● ● ● ● ●●●
●● ●●● ● ● ● ● ●●●●● ●
● ●●●●●
● ●● ●●
●● ●●●● ● ●●●●● ● ●●
● ●●●●●●●●●
●● ●● ● ●● ● ●● ●
214.0 215.5
129.0 130.0 131.0
8 9 10
12
129.0 130.0
131.0 7 8 9
11
138 140
142
● ●●● ●●●●●● ● ●●● ● ●●●●●●
●●● ●● ● ●●●●
● ●●
● ●●●
●● ●●
● ●●●● ●●● ●●
● ● ●● ● ● ●● ● ● ●●● ●
●●● ●●●●● ●●● ● ●● ● ●●
● ● ●●● ●●●●●●
● ●● ●●●●●●●● ● ●● ●●●●●●●●
●●●


●●
●● ●● ●●●
● ●●●●●● ●● ●● ●
● ●●●●●● ● ● ●●
● ●● ●
●● ●● ●● ●● ● ● ● ●●●●● ● ● ● ●●●● ●●● ● ●●● ●●
●●
●●● ●●●● ●●●● ●●●●●●●
●● ● ●●
●●●●●●● ●●●●●
● ●● ●●●●●● ●●● ●
●●●● ● ●●●●●●● ●

● ●●● ● ●●●
● ●●●
● ● ● ● ● ●● ●● ●●●●●●● ●
● ●
● ●

●●● ●●●● ●●●● ●
● ●●●●● ●●● ● ●●●● ●●●●●●●

● ● ●●● ●●●●● ● ●●● ● ●●●●● ●
●●● ●● ●●● ● ●● ●● ● ●●● ● ● ● ●● ●● ●
● ●●●

● ●● ●
●●●

●● ●
●● ● ●●●
● ● ●●●● ●●●●●●●●●
● ●
● ●● ●●● ●●● ●●●●
●●● ●
●●●●● ●●●●● ●●●

●●● ●●●● ● ●
●● ●● ● ● ● ●●●●●●●
● ●●●●●● ● ●●●●●●●● ●●●●●●● ●
●●●●●●●●● ●●●●●●● ●●
●●●●●●●●●● ● ●●● ● ●
● ●●●●●●● ●●●●●●● ● ● ●●● ●●
●●● ● ● ●●●●
● ● ●●●●
●●● ●●●●●●●● ● ●
● ●● ●●● ●● ● ●●● ●
●●●● ● ●●
●● ●● ●● ●●●●● ●
●●●●●●● ● ●● ●● ●
●● ●● ●● ●●●

● ●●●
● ●● ● ● ●●●● ●
●●●●● ●
● ● ● ● ● ● ● ●● ●
●● ●● ●●● ● ●●●●●●● ●●● ● ●
●● ●●●●●● ●● ● ● ● ●● ●●● ●●●●●●●

●●●
●●●●●●● ● ●●● ●● ●
●● ● ● ●●
●● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●
●● ●● ●●


●● ●●●● ●●●●●●●● ● ● ●● ●● ●●●●● ●●●●● ● ●●●●●● ●
●●●● ● ● ●●●●● ●● ● ●● ●
● ●●●● ●●●

●● ●
●● ●● ● ●●● ● ● ● ●
●●● ●●●● ●●●● ● ●●●●●●●●●●●●● ●
●●●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●●●● ● ●●●
●●●●●● ●●● ● ●●●●●● ●●
●●●●●● ● ●●●●●● ●
● ● ●● ● ●●



●●●●●●●● ● ●●●●●●●●

●● ●● ● ● ●● ● ●

●●●●●● ●●● ● ●●● ●

● ●●
● ● ● ● ● ●●
● ●● ●
●●● ●●●●●● ●●●●●●●

● ●●●●●●● ● ● ●●●●●●●
● ●●●●
● ●●●●●●● ●●
●●● ●●●●● ● ●●● ●●
●● ● ● ●● ●
●● ●
● ●●● ●● ●



●●●● ● ●● ● ●●● ●● ●
●● ●
● ●●● ● ●
●● ●●● ● ●●
● ● ●●●● ● ● ● ●● ●● ●●● ●●●● ● ●●●●●●●●●
●● ● ●● ●●●●●
● ●● ● ● ●●
● ● ●● ● ●● ●● ●●●
● ●


●● ●
● ●
●● ● ●●● ●●
● ●


● ● ●●●●● ● ● ●● ● ●●●●●● ●
●●●● ●● ●● ● ● ●●●●●●●● ● ● ●●●●●● ●● ●●
● ●●●●● ● ● ● ●●●●●●●●● ●● ● ●●●●●● ●● ●
● ●●●● ● ● ● ●●● ●●●
● ● ●●●●●● ●●● ● ●
● ●● ●
Left

● ●●● ●●●●● ● ●●●●● ●
● ●●
●●●●●●● ●●●● ●●●●●● ●● ●●● ●
● ●● ●● ● ●●●●●●

● ● ● ● ● ●●● ● ●
●●● ●●●●●● ●●●●● ● ●

●●●●●●●●● ● ●●●● ●●●● ● ● ●● ●
●●● ●●●
●● ● ●●
● ●●● ●●
● ●●
●●● ● ● ●●●●

●●● ●●● ●
● ● ● ●

●●● ● ●●

● ●● ●● ●●● ● ●●●● ● ●●
● ●●
●●
● ●●● ● ●●●●
● ●● ●●●● ● ● ● ●●●●●●● ●
● ●●●●●●● ●
● ●●●●●● ● ●● ●●●●●●● ●●● ●●● ● ●● ● ●●●● ● ●
●● ● ●●● ●●●●●●● ●●
●●● ●●●●● ● ●●●●●

●●●● ●●● ●●●●●●●
● ●

● ●●●●● ●●●●●● ●●● ● ●● ●
●●
●● ●
●●
●●●●●
● ● ●●
●●●●●●● ●●●●●●●● ● ●●●●●●●● ●
● ●●●●●● ● ●●●●●●●●
● ●●●●●● ●●●●●●● ●
Right
●●●
●●
●●●● ● ●●● ●
●● ●●●● ● ●
● ●●●● ●● ● ●● ●●●●●●●●● ● ●●● ●●● ●● ●●●●
●●●●● ●● ●●●● ●● ●● ● ●●
●●●●●●● ●● ●●●●● ● ● ●


●●● ●●●● ● ●●●●●● ●
●●●

● ●●● ●●● ● ●●●●●●●
● ●●●●●●● ● ●●●●●●●
● ●●●●●●●● ●●●●●● ●●
● ●●●●● ● ●●● ●
●● ●● ● ● ●●● ● ● ●● ●●
● ●● ●●

●●●
●●
● ● ●●● ●●● ●●●
● ●●●●
● ●● ●

●●●● ●● ● ●●●●

● ●● ●●●
●● ●●● ●●●●
● ●●
●● ● ●● ●●●● ●●● ● ●●●●
●●
●●●● ●●●●
● ●

●● ● ● ●●
● ●
● ●●
●● ●
●● ●●
● ●●●●● ● ● ●●
Bottom

●●●●●

● ●●● ● ●
● ●● ● ● ●●
● ●
●●●● ● ● ●●
● ●●
●●● ●●
● ● ●●
●●● ●● ●● ●●●●● ● ●
●●● ●●● ●
●●●●●●
● ●● ●● ●
●● ●●●●●● ● ●●●● ● ●
●●●
● ●● ● ● ●●
● ●

●● ●●

● ●●● ●●●

● ●●
●●● ● ●
●●● ● ●●
●● ●● ●
●●● ●●
●●● ● ● ●● ●●
● ● ●●●●● ● ● ●●● ●●
●●
● ● ● ●● ●●●● ● ● ●●●●●●

●●●●● ●●●●●●
● ●●● ●●●

●●●●● ●●●●● ●●● ●●● ●
●●●●● ● ● ●● ●●●
●●● ●●● ●●●
● ●● ●● ●●●●●
● ● ●●●●●●●●●●●●
●●●●●●●●●● ● ●●●●●●● ●●●


●● ●●●●● ● ● ●●●●●●●●●●● ●● ●●●●●●●●● ●
● ● ●●● ● ●


●●●●●●●● ●●●●● ● ● ●●●●● ●
●●●
● ●●●●●
● ● ●●●●● ● ● ●● ●●
● ●●●●●●●●●●● ●●●●●●●●●●●● ● ● ● ●●●●●●●●
●● ●● ●●● ● ● ● ● ● ●
●● ● ●●● ●
● ●
● ●● ●● ● ●●● ●● ●
● ●
● ●
● ●●●●●●●● ●●● ●●●●●●●● ●
●●●●●●●●●●● ●● ● ●● ●● ●●
●● ●●●●● ●●●●●●
●●●● ●●●●●● ●● ● ●●●●●●●


●●● ●●●● ●●● ● ●
● ● ● ●● ●●
●●● ●●●●
● ● ● ●●●●
● ●●● ●●● ●●●● ●●
●●●● ● ●●●●●●●●●
●●●● ●● ●●● ●
● ●● ● ●●●●
●●●●●● ● ● ●●●●● ● ●●●● ●
●●● ● ● ●

●●●● ● ●●●
●●●●● ●

Top
● ● ●●●●●
● ●
●●●● ● ● ●●●● ●●● ●● ●



● ●●●● ●●●●●
●●●● ●●●●● ●● ● ●●●●● ● ●●●●●●●●●
● ●
●●●● ●●● ● ●●●
● ●
● ● ●●●
● ●●●● ● ●●●●●●●
● ●●●●● ●●●●●●●●●
● ●●
● ●●
● ●●●●●●● ● ● ●●●●●●●● ●

● ●●●●
●●●●●● ● ●●● ●
●●●●●●● ● ●●●●●●●●●
●● ●●●●●●● ●●●●●●●
●●●●● ● ● ● ●●
●●●● ●●●
●●●●● ●●● ● ● ●● ● ●●
●●● ●●●●●● ●●●●●●● ●●●●● ●
●●●●●●● ● ●
● ●●● ● ●●● ●
● ● ●●● ●●● ● ●
●●●●●●● ●●●●●●●●●●●
●●● ●●●●●● ●●
●●●● ●●
●● ● ●●

●●●●●●●● ●
●●

● ●● ● ●
●●●●●● ●
● ● ●●●●● ●● ●●● ●●●●●●
●●●●●●●●
● ● ●● ● ●●
● ●●●● ●
●●●●●● ● ● ●●●●●●●●
● ●●●●●●● ● ●
●●●●●●●● ● ● ●●●
●● ● ●●● ●
●●
● ●●
●●●●●● ●●●●●●●● ●
● ●●● ● ● ●●●
● ●●● ●● ● ● ● ● ● ●●
●●●● ●●●●●● ●
● ●●●●●●● ● ●●● ●● ●●●
●●●●●●●●● ● ●● ●●● ●●●●●
● ● ●● ●● ●●
● ●
●●


●●●● ● ● ●●●●●●●
● ●●●●●● ●
●●●●●● ●●●●●●
● ●●●●●● ● ●●●

●●●●●●● ●● ●●
●●● ● ●●●●● ●● ●● ● ●●●●●●●
●● ● ●●
●● ●●●
Diagonal
Lecture notes originally by Prof.
35
138 141 7 9 12 129.0 131.0
8 10 129.0 131.0 214.0 216.0

In R, read the data and produce the scatterplots:
library(mclust)
data(banknote)
StatusX=banknote[,1]
plot(banknote[,2:7])
Status contains the info about whether a note is genuine or counterfeit. Center the data and perform the PC analysis:
XCbank=scale(banknote[, 2:7], scale = FALSE)
PCX=prcomp(XCbank,retx=T)
PCX
Lecture notes originally by Prof. 36

Let’s take a closer look at the two PCs for the banknote data.
The eigenvalues and eigenvectors are given in R in the following form:
Standard deviations (1, .., p=6):
[1] 1.7321388 0.9672748 0.4933697 0.4412015 0.2919107 0.1884534
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5 PC6
Length
Left
Right
Bottom -0.768 0.563 0.218 0.186 -0.100 -0.022
Top -0.202 -0.659 0.557 0.451 -0.102 -0.035
Diagonal 0.579 0.489 0.592 0.258 0.084 -0.046
0.044 -0.011 0.326 -0.562 -0.753 0.098
-0.112 -0.071 0.259 -0.455 0.347 -0.767
-0.139 -0.066 0.345 -0.415 0.535 0.632
The eigenvectors are the columna of the so called rotation matrix and the eigenvalues are the square of the so-called standard deviations.
Keep eigenvectors in gamma and eigenvalues in lambda. gamma=PCX$rotation
lambda=PCX$sdevˆ2
Lecture notes originally by Prof. 37

Let’s looks at the first 2 PCs: the data clearly separate into two groups
pX=XCbank%*%gamma
plot(pX[,1],pX[,2],pch=”*”,xlab=”PC1″,ylab=”PC 2″,asp=1)
*
*
*
** *** ***
*** * * * ***
* ** * *
* ** *** * *** *
* * * * *** * ** *
* * ***
* ****
*** ** ***********
** *** * ***
**** ** * ** * ***
**** * ******* **
*
* *** * *** **
* * * * * *
** * * * * ** *** ** ***
*** * * *** *
* * ** * * *
*** **
* *
** *
*
*
** *
−6 −4 −2 0 2 4 6
PC1
Lecture notes originally by Prof. 38
PC 2
−3 −2 −1 0 1 2 3

Can do more simply in R. Keep projected data Yi’s in Y Y=PCX$x
plot(Y[,1],Y[,2],pch=”*”,xlab=”PC1″,ylab=”PC 2″,asp=1)
*
*
*
** *** ***
*** * * * ***
* ** * *
* ** *** * *** *
* * * * *** * ** *
* * ***
* ****
*** ** ***********
** *** * ***
**** ** * ** * ***
**** * ******* **
*
* *** * *** **
* * * * * *
** * * * * ** *** ** ***
*** * * *** *
* * ** * * *
*** **
* *
** *
*
*
** *
−6 −4 −2 0 2 4 6
PC1
Lecture notes originally by Prof. 39
PC 2
−3 −2 −1 0 1 2 3

The two groups actually correspond to the genuine and the fake ban- knotes. The first two PCs have captured that information! We don’t need to keep all 6 dimensions to see this. (The blues tend to have large PC1 and PC2 values)
*** **o
* **** o
*** o
*** o ooo
* * * **** ooo ooo oo o
* * oooooo ooooo ****** ooo
* o o oo o
** *** oooooo o *** ooo
* o o ooo
******* o ooo
o
* *
o
o ooooo o oooooo o
* o oooo o
* o o
* o oo
* * o o *** * * * ooo o
*** ooo **** * o
* **** ** * * *
o
****
*
* *
** *
*
oo
* Fake
o Genuine
−3 −2 −1 0 1 2
PC1
Lecture notes originally by Prof. 40
PC 2
−3 −2 −1 0 1 2 3

We have
Yi1 = 0.044Xi1 − 0.112Xi2 − 0.139Xi3 − 0.768Xi4 − 0.202Xi5 + 0.579Xi6 Yi2 = −0.011Xi1 − 0.071Xi2 − 0.066Xi3 + 0.563Xi4 − 0.659Xi5 + 0.489Xi6.
Thus
• the first PC is roughly the difference between the 6th (length of diago- nal) and the 4th component (bottom margin);
• the second PC is roughly the difference between the 5th (top margin) and the sum of the 6th (length of diagonal) and the 4th component (bottom margin).
Lecture notes originally by Prof. 41