CS代考 5 5.4

5 5.4
PRINCIPAL COMPONENT ANALYSIS INTERPRETATION OF THE PCS
 Recall and
var(Yij) = λj, for j = 1,…,p ppp
􏰋 λj = 􏰋 var(Yij) = tr(Σ) = 􏰋 var(Xij). j=1 j=1 j=1
 ⇒ can measure how well the first q PCs explain variation of the data
via the ratio:
􏰊qj=1 λj ψq = 􏰊p λ .
j=1 j
Lecture notes originally by Prof. 1

In the banknotes example, we have
eigenvalues proportionvariance cumulativeproportion
3.00030487 0.668 0.93562052 0.876 0.24341371 0.930 0.19465874 0.973 0.08521185 0.992 0.03551468 1.000
so that
ψ1 = 0.668, ψ2 = 0.876, ψ3 = 0.930, ψ4 = 0.973, ψ5 = 0.992, ψ6 = 1.000.
In other words, the first PC explains 66.8% of the variability of the data, the first two explain together 87.6% of that variability and the first 3 to- gether explain 93%, etc.
0.668 0.208 0.054 0.043 0.019 0.008
Lecture notes originally by Prof. 2

We often plot the ψq’s in a graph called scree plot, which displays λ1 ≥λ2 ≥…≥λp.
It allows us to see which components contribute the most.
In R, we can apply screeplot to the result of a PC analysis obtained
through prcomp: ●
PCX





123456
Lecture notes originally by Prof. 3
Variances
0.0 0.5 1.0 1.5 2.0 2.5 3.0

 This also guides us about how many PCs to consider, as it is not useful to consider all p PCs in light of dimensionality reduction.
 On the other hand: we should keep enough of them to get a large amount of information about the data, i.e. their cumulative percentage of variance is rather large.
 Usually we look for an elbow in the screeplot and stop there; in the bank example, we could stop after 3 PCs.
Lecture notes originally by Prof. 4

 In the bank data example, the first two PCs together explain 87.6% of the data, which is already reasonable, but with the first 3 PCs we explain 93% of the variability of the data, which is a little better.
 The third component only adds a little extra information. In some case that little information could actually contain interesting aspects about the data that were not captured by the previous PCs, so needs to be investigated.
 If the first few, say, 3 or 4, PCs don’t explain most of the variability of the data, then this means that we can’t effectively reduce dimension well with PCA.
 That being said, PCA is a linear approach for dimensionality re- duction; maybe some non-linear approach will work! (more advanced topics than intended for this course)
Lecture notes originally by Prof. 5

 Since we can construct up to p principal components, let Yi = (Yi1,…,Yip)T
denote the p PCs of Xi. Recall that Yik = γkT (Xi − μ).
 We can compute the cov between all pairwise elements of Xi and Yi
through the covariance matrix between two vectors.  Let Γ = (γ1,…,γp) contain the p eigenvectors of Σ.  So
Yi = ΓT (Xi − μ).
 Let Λ = diag(λ1, . . . , λp) contain the p corresponding eigenvalues in decreasing order, and let γjk be the j-th component of γk. Also recall that we denote by σjk the (j, k)th element of Σ.
Lecture notes originally by Prof. 6

 We have so that Therefore,
ρXij,Yik =
Note: In practice, these quantities are all replaced by their empirical
cov(Xi, Yi) = ΓΛ. cov(Xij, Yik) = γjkλk.
γj k λk λ1/2
version.  Since
then
λ1/2 ρXij,Yik = γjk k
(σj j λk )1/2
=γjk k . (σj j )1/2
p
􏰋ρ2Xij,Yik = k=1 jk = jj =1
􏰊p γ2λk σ k=1 σjj σjj
(σjj)1/2
Lecture notes originally by Prof. 7

 Thus the squared correlations
ρ2Xij,Yik = jk
 Since
then
so that the pair (ρXij ,Yi1 , ρXij ,Yi2 ) lies inside a circle of radius 1.
p
􏰋ρ2Xij,Yik =1
k=1
ρ2Xij,Yi1 + ρ2Xij,Yi2 ≤ 1
γ2 λk σjj
may be interpreted as the proportion of variance of Xij explained by Yik .
Lecture notes originally by Prof. 8

 if Xij is strongly correlated with (Yi1, Yi2) then it will lie close to the periphery of the circle.
 Plotting all the the duples (ρXij,Yi1, ρXij,Yi2), j = 1, . . . , p on the same picture ⇒ visualize which of Xi1, . . . , Xip are the most strongly corre- lated with Yi1 and Y12.
 Caveat:
does NOT depend on i.
 In practice: the γjk’s, the λk’s and the σjj’s are replaced with their empirical estimates, using the empirical covariance matrix S instead of the theoretical one Σ.
λ1/2 ρXij,Yik = γjk k
(σjj)1/2
Lecture notes originally by Prof. 9

Recall that in the banknotes examples we have
Yi1 = 0.044Xi1 − 0.112Xi2 − 0.139Xi3 − 0.768Xi4 − 0.202Xi5 + 0.579Xi6 Yi2 = −0.011Xi1 − 0.071Xi2 − 0.066Xi3 + 0.563Xi4 − 0.659Xi5 + 0.489Xi6,
where Xi1 to Xi6 are Length, Left, Right, Bottom, Top, Diagonal: correlations between the Xj’s and PC1 and PC2
Bottom
Right Left
Top
Diagonal
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 10
PC2
−1.0 −0.5 0.0 0.5 1.0

We see that diagonal, top and bottom are the three original variables that are the most correlated with (and hence well explained by) the first two PCs.
correlations between the Xj’s and PC1 and PC2
Bottom
Right Left
Top
Diagonal
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 11
PC2
−1.0 −0.5 0.0 0.5 1.0

Generally, for variables that are close to the periphery of the circle:
 Variables pointing in the same direction are positively correlated.  Variables pointing in opposite directions are negatively correlated.
 Variables that form a small angle with an axis are strongly corre- lated with the corresponding PC.
Lecture notes originally by Prof. 12

We knew PC1 has the most weight on X4 (bottom) and X6 (diagonal), which have coefficients of different sign. We see that both are close to the periphery of the correlation circle. Bottom has -ve correlation with PC1, diagonal has +ve correlation with PC1; they have opposite effect on PC1.
correlations between the Xj’s and PC1 and PC2
Bottom
Right Left
Top
Diagonal
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 13
PC2
−1.0 −0.5 0.0 0.5 1.0

We knew PC2 has the most weight on X4 (bottom), X6 (diagonal) and X5 (top), where the first two have a positive coefficient and the third, a negative one. We see that X4 and X6 are positively correlated with PC2 and X5 is negatively correlated.
correlations between the Xj’s and PC1 and PC2
Bottom
Right Left
Top
Diagonal
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 14
PC2
−1.0 −0.5 0.0 0.5 1.0

Table taken from Hardle and Simar, page 331 confirms what we just said. Also we see that X1, X2 and X3 are not strongly correlated with the first two PCs and the % of their variance explained by the first two PCs to- gether is small.
11.4 Asymptotic Properties of the PCs
Table 11.2 Correlation between the original variables and the PCs
r2
Xi Y1
331
C r2
Xi Y2
rXi Y1
rXi Y2
􏰿0:201
0:028
0:538
0:191
0:597
0:159
0:921
􏰿0:377
0:435
0:794
􏰿0:870
􏰿0:410
X1 length
X2 left h.
X3 right h.
X4 lower
X5 upper
X6 diagonal 0:926
0:041 0:326 0:381 0:991 0:820
(Caution: the signs of the correlations are flipped. The book has taken the negative version of the eigenvectors in constructing the first 2 PCs. )
Summary
,! The weighting of the PCs tells us in which directions, expressed in original coordinates, the best explanation of the variance is obtained. Note that the PCs are not scale invariant.
,! A measure of how well the first q PCs explain variation is given Pq Pp
Lecture notes originally by Prof. 15
by the relative proportion q D jD1 􏰯j = jD1 􏰯j . A good
graphical representation of the ability of the PCs to explain the

* *
o
o ooooo
o oooooo o
*****
* ** *
o oo * o
***
* * ** o o o o
*** * oooooooo o ** * o o o o
*** ** ooooooooo
** ****
**ooo
* o oo o o *** ooooo o
* * o ooo
** * o o ooooooo
* o oooo o * o oooo
** oo ** * * * o o * ooo
*** oo **** * o
* ***** ** * * ****
*
* *
** *
o
*
o
* Fake
o Genuine
−3 −2 −1 0 1 2
PC1
correlations between the Xj’s and PC1 and PC2
Bottom
Right Left
Top
Diagonal
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 16
PC2 PC 2
−1.0 −0.5 0.0 0.5 1.0 −3 −2 −1 0 1 2 3

Roughly, for variables that are close to the periphery of the circle and if the first two PCs account for a large percentage of the variability of the data:
 We can see which of those variables tend to influence the coordi- nates of the individuals.
 Example: if an individual has a large PC1 (compared to other in- dividuals), it means that it tends to have large values of the variables that are correlated with PC1.
 Same for PC2.
 But we need to be careful in our interpretation of the graphs as they are only rough approximations and, again, we can only use variables that are well represented by the first two PCs.
Lecture notes originally by Prof. 17

Diagonal, bottom and top seem to play an important role in distinguish- ing the fake from the genuine bank notes.
PC1 highly positively correlated with diagonal ⇒ large values of PC1 tend go together with large values of diagonal.
PC1 highly negatively correlated with bottom ⇒ small values of PC1 tend to go together with large values of bottom.
PC2 is highly negatively correlated with top ⇒ large values of PC2 tend to have smaller values of top.
The genuine bank notes tend to have larger values of diagonal and smaller values of bottom, the fake bank notes tend to have larger values of top and bottom.
Lecture notes originally by Prof. 18

***
* *****
* **** o * ** * * *
o**** ** ***** oo* *****
oo o ooo * **** *
ooo o *** *
o o * **** oooooooo * ooooo * *
o oo ooooooo ** *** * ooo **
o o o oo
oo oooo o oo
o ooo o ooo
o ooo o o
*
Centered variables: diagonal, top and bottom
−3 −2 −1 0 1
Bottom
ooo*oo** ******* **
ooooo * * *****
oooo o *
***
oooo **
o
o
**
** ** * *
*
o o ** * ooooo o
oooo o oo o
o
oo ** oooo oo
* *** *
*
** *** **
oo o oooooo*
** **
−2 −1 0 1 2 3 −2 −1 0 1 2
(red = “fake”, blue = “genuine”)
Lecture notes originally by Prof. 19
** *** *
* * * ** *
*
*** **
ooo
oooooo* *
** **
ooooooo o* oo o ** *
oo oo ooo o o o
***** * * ** *** ** ***
*
*** ***
o
oooooo *****
o
*** *
oo o o
oooo o
oooo o o oooo *
* *
* * **
****** * oo
*
*** o ******* o
* ** *****
**
*** * * ** * * ***
* **** * *** ****
**** ***
* ** ***** o ** * *
** **** * *** *
**** o
* o o oo o
** *oo oooo oo
* oooooo **oo oo oo o o
o oo
* * o ooo oo
* ooooooo ***ooo
oo ooo * ooo
**
* *****
* * **o* * ooo
*
* oo o o o **o
******* ooo ***** o ooo
** *** ** ooooo *** ooo
** ooooo **** * o o ooo
**** o o * *** o ooooo o
* * * o
** ooo
oo oo oo oo o o o
o o ooo o o ooo o
* ooooo
oo * oooo
o o
oooo o oooo
o o oo o oo
oooo
Top
ooooooo o
ooo o o oo
o ooo
o o ooo
oo ooo o o
ooo o ooooooo o o
ooooo oo oooooo o
oo ooooo oo ooooo oo
o ooo
oo oooo
ooooo oo oo o
oo o oo o ooo
*
o
**
* *
* * *** ** * **
* **** * * ** *** * **** * ****
*
**
****** * **
***
* * ** *
***** * ****** **
* *** *** ***
o
o o oo oo o ooooo
oooooo ooo
oooo ooo o ooo oo
o oooooo
o oooooo oo
*
*** *** ** * * ***
o ooo oo ooooo
ooo o o o oo o
o ooo
oo ooo
* * ********* *
ooo
*
***** **o * * * * ** * * **
** **** * ** **
** *
*** **
** *** ** * ***
* **
* *
Diagonal
−2 −1 0 1 2
−2 0 1 2 3
−3 −2 −1 0 1

***
* *****
* **** o * ** * * *
o**** ** ***** oo* *****
ooo*oo** ******* **
ooooo * * *****
oooo o *
***
oooo **
oo o ooo * **** *
ooo o *** *
o o * **** oooooooo * ooooo * *
o oo ooooooo ** *** * ooo **
o o o oo
oo oooo o oo
o ooo o ooo
o ooo o o
*
Original variables: diagonal, top and bottom
8 9 10 11 12
Bottom
o
o
**
** ** * *
*
o o ** * ooooo o
oooo o oo o
o
oo ** oooo oo
* *** *
*
** *** **
oo o oooooo*
** **
7 8 9 10 11 12 138 139 140 141 142
(red = “fake”, blue = “genuine”)
Lecture notes originally by Prof. 20
** *** *
* * * ** *
*
*** **
ooo
oooooo* *
** **
ooooooo o* oo o ** *
oo oo ooo o o o
***** * * ** *** ** ***
*
*** ***
o
oooooo *****
o
*** *
oo o o
oooo o
oooo o o oooo *
* *
* * **
****** * oo
*
*** o ******* o
* ** *****
**
*** * * ** * * ***
* **** * *** ****
**** ***
* ** ***** o ** * *
** **** * *** *
**** o
* o o oo o
** *oo oooo oo
* oooooo **oo oo oo o o
o oo
* * o ooo oo
* ooooooo ***ooo
oo ooo * ooo
**
* *****
* * **o* * ooo
*
* oo o o o **o
******* ooo ***** o ooo
** *** ** ooooo *** ooo
** ooooo **** * o o ooo
**** o o * *** o ooooo o
* * * o
** ooo
oo oo oo oo o o o
o o ooo o o ooo o
* ooooo
oo * oooo
o o
oooo o oooo
o o oo o oo
oooo
Top
ooooooo o
ooo o o oo
o ooo
o o ooo
oo ooo o o
ooo o ooooooo o o
ooooo oo oooooo o
oo ooooo oo ooooo oo
o ooo
oo oooo
ooooo oo oo o
oo o oo o ooo
*
o
**
* *
* * *** ** * **
* **** * * ** *** * **** * ****
*
**
****** * **
***
* * ** *
***** * ****** **
* *** *** ***
o
o o oo oo o ooooo
oooooo ooo
oooo ooo o ooo oo
o oooooo
o oooooo oo
*
*** *** ** * * ***
o ooo oo ooooo
ooo o o o oo o
o ooo
oo ooo
* * ********* *
ooo
*
***** **o * * * * ** * * **
** **** * ** **
** *
*** **
** *** ** * ***
* **
* *
Diagonal
138 140 142
789 11
8 9 10 11 12

5.5
NORMALISED PCA
 In some cases the original variables have variances of very different scales.
 ⇒ PC1 will focus on the variable that varies the most just because of its scale.
Lecture notes originally by Prof. 21

 Multiply X1 (length) by 100 and PC1 will focus all its attention on X1
−100 −50 0
50 100
150
Multiply X1 by 100
o
o ooo
o oo
ooo
ooo ooo
oo oooo
o ooooo ooooo
o oo oo ooooo
o
*
oooo
* ****
** * * * *
oo oo oo o
oo oooo
o oo ooooooooo
oo ooooo
oo *oo
o
o
o
ooo o
**
**** ****
* *** * * * *
***
** ****
*** *
* * * * *
****** *
***** *
*
* * * * **
* * * * * * *** *
***
**** * ***
* Fake
o Genuine
PC1
Lecture notes originally by Prof.
22
PC 2
−3 −2 −1 0 1 2 3

−100 −50 0
50 100
150
Multiply X1 by 100
oo oo
o ooo
ooo oooo o
oo oooo
o ooooo oooooo
o o o ooo
o
*
oo oooo
******** ** * * * *
oo oo oo o
oo oo o o
oooo ooooooooo
oo ooooo
oooo o ooo o
*oo **
**** ****
*
* *** *
* *****
* * * * * * * * * *
****** *
***** ** * * * *
* ** ****
**** * *
**** **
* Fake
o Genuine
PC1
correlations between the Xj’s and PC1 and PC2
Diagonal
Top
Bottom
Left Right
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof.
23
PC2 PC 2
−1.0 −0.5 0.0 0.5 1.0 −3 −2 −1 0 1 2 3

 Multiply X1 (length) and X2 (left) by 100. Then PC1 and PC2 will focus all its attention on X1 and X2, we learn nothing interesting about the data (We “killed” the differentiating ability of the two PCs):
Multiply X1 and X2 by 100
o
o
o ooooo o o o
o o o o o * o o *
o
*o ooo*
o o o
o o o o * o o o o o o o o o
* oo ***o o**
o
* *
* *
* *
* *
o
o *
o
oooo*o **o oo* o
o *o * * * *o
oo*oo**oo* o o***oo **
*o o***
o o * *
* * * * * * * * * *
* * * * * * * * * * * * * ***
* Fake
o Genuine
−100 −50 0
50 100 150
PC1
Lecture notes originally by Prof.
24
−100 −50 0 50
PC 2

−100 −50 0
50 100 150
Multiply X1 and X2 by 100
o
o
o ooooo o o o
o o o o o * o o *
o
o * * o o o o*
o o o
o o o
o * o o o o o o o o o
* o o * * o* o * * o * * o
o
***** *
*
o
** o * * o o * * *
o
ooo* *o o o * o o *
o*o *
o * * * o
oo*oo**oo* o ** o **
* * * * * * * * * * * o * * * * * * * * * * * * * *
* Fake
o Genuine
PC1
correlations between the Xj’s and PC1 and PC2
Diagonal
Length
Top
Bottom Right
Left
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 25
PC2 PC 2
−1.0 −0.5 0.0 0.5 1.0 −100 −50 0 50

Before we had:
o oo * o
***
**** o ooo
*** * oo ooooo o *** oo oo
oo o ***** ooooooo
**ooo
* o oo o o ** *** ooooo o
o
* *
*****
* ** *
o
o ooooo
o oooooo o
* * o ooo
**** ** * o o ooooooo
* o oooo o * o o
** o oooo *** * * * ooo o
*** ooo **** * o
* ***** ** * * ****
*
* *
** *
o
*
* Fake
o Genuine
−3 −2 −1 0 1 2
PC1
correlations between the Xj’s and PC1 and PC2
Bottom
Right Left
Top
Diagonal
Length
−2 −1 0 1 2
PC1
Lecture notes originally by Prof. 26
PC2 PC 2
−1.0 −0.5 0.0 0.5 1.0 −3 −2 −1 0 1 2 3

 When the scales are very different, to avoid the first few PCs to cap- ture merely the scale of a few variables, it is better to rescale all the components to have variance one before performing the PC analysis: replace Xi = (Xi1,…,Xip)T by
D−1/2Xi where D = diag(σ11, . . . , σpp).
 Equivalently we perform the eigen analysis on the correlation ma- trix instead of the covariance matrix. Indeed
var(D−1/2Xi) = D−1/2var(Xi)D−1/2 = D−1/2ΣD−1/2 = P, the correlation matrix.
Lecture notes originally by Prof. 27

5.6 OTHER EXAMPLES
Boston housing data from section 22.1 of Hardle and Simar, coming from Harrison and Rubinfeld (1978). We have n = 506 observations for each census district of the Boston metropolitan area, p = 14 variables:
X1: Per capita crime rate,
X2: Proportion of residential land zoned for large lots,
X3: Proportion of nonretail business acres,
X4: (1 if tract bounds river, 0 otherwise),
X5: Nitric oxides concentration,
X6: Average number of rooms per dwelling,
X7: Proportion of owner-occupied units built prior to 1940,
X8: Weighted distances to five Boston employment centers,
X9: Index of accessibility to radial highways,
X10: Full-value property tax rate per $10,000,
X11: Pupil/teacher ratio,
X12: 1000(B − 0.63)2I(B < 0.63) where B is the proportion of African American, X13: % lower status of the population, X14: Median value of owner-occupied homes in $1,000. Lecture notes originally by Prof. 28  Some transformation of the data is applied before performing PCA (see Hardle and Simar, chapter 1). We do PCA on all but the 4th vari- able which is binary.  The variables are on completely different scales so we standardise the variables before performing the analysis.  Here are the graphs and tables and interpretation of the results (first table taken from Hardle and Simar, section 11.8): Lecture notes originally by Prof. 29 if the plots are colour coded with respect to some particular variable of interest. Table 11.5 Eigenvalues and percentage of explained variance for Boston Housing data MVAnpcahousi Eigenvalue Cumulated percentages 7:2852 0:5604 1:3517 0:6644 1:1266 0:7510 0:7802 0:8111 0:6359 0:8600 0:5290 0:9007 0:3397 0:9268 0:2628 0:9470 0:1936 0:9619 0:1547 0:9738 0:1405 0:9846 0:1100 0:9931 0:0900 1:0000 Percentages 0:5604 0:1040 0:0867 0:0600 0:0489 0:0407 0:0261 0:0202 0:0149 0:0119 0:0108 0:0085 0:0069 Since we have started from many variables, it is not surprising that the first few PCs explain only 80% of the variability of the data and it is worth looking at the pictures produced for those. Lecture notes originally by Prof. 30 Table 1: Correlations PC1 PC2 PC3 X1 0.9076059 -0.22470670 0.14574342 X2 -0.6398785 0.02915088 0.50576358 X3 0.8580123 -0.04093873 -0.18449264 X5 0.8736580 -0.23912156 -0.17801334 X6 -0.5104008 -0.70365103 0.08691877 X7 0.7999128 -0.15560393 -0.29488260 X8 -0.8258822 0.29043356 0.29824045 X9 0.7530739 -0.28569624 0.38044546 X10 0.8114050 -0.16453746 0.36718083 X11 0.5673832 0.26672245 0.14978763 X12 -0.4906207 0.10408647 -0.51696251 X13 0.7996176 0.42532966 -0.02506993 X14 -0.7366164 -0.51602565 -0.17473901 X1: Per capita crime rate (crim), X2: Proportion of residential land zoned for large lots (zn), X3: Proportion of nonretail business acres (indus), X4: (1 if tract bounds river, 0 otherwise), X5: Nitric oxides concentration (nox), X6: Average number of rooms per dwelling (rm), X7: Proportion of owner-occupied units built prior to 1940 (age), X8: Weighted distances to five Boston employment centers (dis), X9: Index of accessibility to radial highways (rad), X10: Full-value property tax rate per $10,000 (tax), X11: Pupil/teacher ratio (ptratio), X12: 1000(B − 0.63)2I(B < 0.63) where B is the proportion of African American (b), X13: % lower status of the population (lstat), X14: Median value of owner-occupied homes in $1,000. (medv) Lecture notes originally by Prof. 31 corr between standardised Xj's and PC1 and PC2 dis medv rm lstat ptratio indus b zn age tax ncorixm rad −2 −1 0 1 2 PC1 • Most variables, except for X2(zn) and X12 (b), are close to the periph- ery of the circle, having a large correlation with PC1 mostly. • PC1 is strongly – negatively correlated with X8(dis), X14(medv), X6(rm) – positively correlated with all others except for X12(b) and X2(zn). • The PC1 axis opposes the variables with positive correlation (need to list them) with those with negative correlation. It could roughly be interpreted as a quality of life and house indicator. Lecture notes originally by Prof. 32 PC2 −1.0 −0.5 0.0 0.5 1.0 o o **o* * * * * o **ooooo* ** * ** * * ** * ** o * ** * * * **** ** **** * o** ********** **** *** *** oo**** o o** o***** * o o o***oo***o*o** *** o* o o*oo*o**o***o*o****** **** o oooo*o*o*oo*o * ooooooooo*o**o o*o**** * * oooo*** ***** o ooooooooooo * * o ooo **** ** o o oo oo o *** ** oooo o oo *o** * oooooo o********** o o o o o ooo * *o * * * * ** * ooooooo o o*** ******* ** * ooo**o*** o oo o o* * * oo ooo o* **** oooo o ooo * o o******* ooo o ** * ** * oo*** o o o o* **** * oo ooooo ooo*o*** o**o**** * oo ooo oo oo * ** o o oo o ooo ooo ooo* oooooo oooo o ooooo * ooo o o oo o o oo o * o expensive houses * cheaper houses −6 −4 −2 0 2 4 6 PC1 • The colors code X14 > median in blue. So PC1 andPC2 seem related to house value.
• Brief interpretation: Individuals with high house value tend to also have high values of the variable that were negatively correlated with PC1 and PC2, and the individuals with cheaper houses tend to have high values with the variables that were positively correlated with PC1 and PC2, etc..
• However these need to be further investigated and confirmed by going back to the original variables
Lecture notes originally by Prof. 33
PC 2
−4 −3 −2 −1 0 1 2 3

Table 2: Correlations
PC1 PC2 PC3
X1 0.9076059 -0.22470670 0.14574342 X2 -0.6398785 0.02915088 0.50576358 X3 0.8580123 -0.04093873 -0.18449264 X5 0.8736580 -0.23912156 -0.17801334 X6 -0.5104008 -0.70365103 0.08691877 X7 0.7999128 -0.15560393 -0.29488260 X8 -0.8258822 0.29043356 0.29824045 X9 0.7530739 -0.28569624 0.38044546
X10 0.8114050 -0.16453746 0.36718083 X11 0.5673832 0.26672245 0.14978763 X12 -0.4906207 0.10408647 -0.51696251 X13 0.7996176 0.42532966 -0.02506993 X14 -0.7366164 -0.51602565 -0.17473901
X1: Per capita crime rate (crim),
X2: Proportion of residential land zoned for large lots (zn),
X3: Proportion of nonretail business acres (indus),
X4: (1 if tract bounds river, 0 otherwise),
X5: Nitric oxides concentration (nox),
X6: Average number of rooms per dwelling (rm),
X7: Proportion of owner-occupied units built prior to 1940 (age),
X8: Weighted distances to five Boston employment centers (dis),
X9: Index of accessibility to radial highways (rad),
X10: Full-value property tax rate per $10,000 (tax),
X11: Pupil/teacher ratio (ptratio),
X12: 1000(B − 0.63)2I(B < 0.63) where B is the proportion of African American (b), X13: % lower status of the population (lstat), X14: Median value of owner-occupied homes in $1,000. (medv) Lecture notes originally by Prof. 34  PC2 opposes with X8 (dis), X11 (ptratio) and X13 (lstat) X6 (rm) and X14 (medv). Roughly, PC2 can be interpreted as a social factor explaining only 10% of the total variance. Again more investigation would be needed.  PC3 is dominated by a polarity between zn (X2) and b (X12) but the correlations are far from the periphery of the circle and so it is more risky to connect this figure with the individuals as we did for PC1/PC2. Lecture notes originally by Prof. 35