Principal Component Analysis
Goals
• Restructure interrelated variables
• Simplify description
• Reduce Dimensionality
• Avoid multi-collinearity problems in regression
2
Basic Idea
• X1 , X2 – correlated Transform them into:
• C1 , C2 – uncorrelated
3
Geometric Concept
4
2 Principal Components: C1 , C2 (p 361)
5
Plot of Principal Components: C1 , C2 (p 362)
6
Equations for Principal Components: C1 , C2
• C1 = 0.85X1 + 0.53X2
• C2 = -0.53X1 + 0.85X2
• Var C1 = 135.0
• Var C2 = 22.5
• Note: Dot Product of C1 , C2 = 0
• Hence they are orthogonal, therefore
uncorrelated, and Var C1 > Var C2
7
Principal Component Model
8
C1 C2
Matrix Equation
X1 X2
⋮
Xp
a11 a12 . . . a1p = a21 a22 . . . a2p
⋮⋮
Cp ap1 ap2 . . .app
9
Principal Components: C1 ,…, Cp
• From P original variables: X1 ,…, Xp derive P principal components C1 ,…, Cp
• Each Cj is a linear combination of the Xi ‘s • Cj = a j1 X 1 + a j2 X 2 + …. + a jp X p
10
Properties of Principal Components:
• Coefficients are chosen to satisfy: Var C1 ≥ Var C2 ≥ . . . ≥ Var Cp
• Variance is a measure of information:
• For Example, In Prostate Cancer:
• Any two principal components are orthogonal, hence uncorrelated
11
Calculation of Principal Components
• Let S be the Covariance Matrix of the X variables.
• Then aij ‘s are the solution to the equation:
(S – λI)a = 0 (Hotelling – 1933)
S=
s11 s12 s21 s22
sp1 sp2
. . . s1p . . . s2p ⋮
. . . spp
12
Recall Some Terminology
• Solutions to (S – λI)a = 0 are:
• λ a scalar known as the eigenvalue
• a a vector known as the eigenvector
• a is not unique. There are an infinite number of possibilities, so:
• Choose a such that the sum of the squares of coefficients for each eigenvector is = 1.
• This yields: P unique eigenvalues and P corresponding eigenvectors. 13
• •
Then
The eigenvectors are the Principal Components
• λ a scalar known as the eigenvalue
• a a vector known as the eigenvector
a is not unique. There are an infinite number of possibilities, so:
• Choose a such that the sum of the squares of coefficients for each eigenvector is = 1.
• This yields: P unique eigenvalues and P corresponding eigenvectors.
14
Review
• Principal Components are the eigenvectors, • and their variances are the eigenvalues
• of the covariance matrix S of the X’s
• Variances of the Cj ‘s add to the sum of the variances of the original variables (total variances)
15
Example Revisited (p 361) • VarX1 =104.0 VarX2 =53.5 sum =157.5
• VarC1 =135.0 VarC2 =22.5 sum =157.5 • Total Variance is Preserved
16
Choosing m
• Rely on existing theory
• Kaiser’s Rule:
• S: choose λi > 𝐬𝐮𝐦 𝐨𝐟 𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞 of X’s/P • R:chooseλi >𝟏
• We need to explain a given %
• Elbow Rule
17
Cumulative Percentages of Total Variance for Depression Data (p 368)
18
Eigenvalues for Depression Data – Scree Plot (p 368)
A plot, in descending order of magnitude, of the eigenvalues of a correlation matrix
19
Depression, CESD, Example
• Using R (Correlation, Not Language): • Choose Eigenvalues > 𝟏
• Hence choose 5 Principal Components
20
Elbow Rule for Choosing m
• Start with the scree plot
• Choose a cutoff point where:
• Lines joining consecutive points are “steep” left of the cutoff point, and
• “flat” right of the cutoff point
• The point where the two slopes meet is the cutoff
point
21
Eigenvalues for Depression Data – Scree Plot (p 368)
M= 2
22
Principal Components for standardized CESD scale items (p 369)
23
Reading the Output
• Here: X1 = “I felt that I could not shake …” X2 = “I felt depressed, …”
• The Principal Component are:
C1 = 0.2774X1 + 0.3132X2 + …
C2 = 0.1450X1 + 0.0271X2 + … etc.
24
Coefficients as Correlations
• Recall: Correlation (Ci , Xj ) = aij * λi 1⁄2
• Choose X’s where Correlation > 0.5
• Example:
• For C1 , λ1 = 7.055; λ1 1⁄2 = 2.656
• Correlation > 0.5 => a1j > 0.5/2.656 = 0.188
• Similarly for other Principal Components
25