程序代写代做 BIA 652

BIA 652
Class 11 – Principal Component Analysis Practical Multivariate Analysis, Afifi, et. al. Chapter 14

Agenda
• Principal Component Analysis – Chapter 14 • Review of some Matrix Analysis
2

Goals
• Restructure interrelated variables
• Simplify description
• Reduce Dimensionality
• Avoid multi-collinearity problems in regression
3

Basic Idea
• X1 , X2 – correlated Transform them into:
• C1 , C2 – uncorrelated
4

Geometric Concept (p 359)
5

2 Principal Components: C1 , C2 (p 361)
6

Plot of Principal Components: C1 , C2 (p 362)
7

Equations for Principal Components: C1 , C2
• C1 = 0.85X1 + 0.53X2
• C2 = -0.53X1 + 0.85X2
• Var C1 = 135.0
• Var C2 = 22.5
• Note: Dot Product of C1 , C2 = 0
• Hence they are orthogonal, therefore
uncorrelated, and Var C1 > Var C2
8

Principal Component Model
9

Principal Components: C1 ,…, Cp
• From P original variables: X1 ,…, Xp derive P principal components C1 ,…, Cp
• Each Cj is a linear combination of the Xi ‘s • Cj = a j1 X 1 + a j2 X 2 + …. + a jp X p
10

C1 C2
Matrix Equation
X1 X2

Xp
a11 a12 . . .a1p = a21 a22 . . .a2p
⋮⋮
Cp ap1 ap2 . . . app
11

Properties of Principal Components:
• Coefficients are chosen to satisfy: Var C1 ≥ Var C2 ≥ . . . ≥ Var Cp
• Variance is a measure of information:
• For Example, In Prostate Cancer:
• Gender has 0 variance, no information
• Size of tumor has variance > 0, useful information
• Any two principal components are orthogonal,
hence uncorrelated
12

Calculation of Principal Components
• Let S be the Covariance Matrix of the X variables.
• Then aij ‘s are the solution to the equation:
(S – λI)a = 0 (Hotelling – 1933)
S=
s11 s12 s21 s22
sp1 sp2
. . .s1p . . .s2p ⋮
. . .spp
13

Recall Some Terminology
• Solutions to (S – λI)a = 0 are:
• λ a scalar known as the eigenvalue
• a a vector known as the eigenvector
• a is not unique. There are an infinite number of possibilities, so:
• Choose a such that the sum of the squares of coefficients for each eigenvector is = 1.
• This yields: P unique eigenvalues and P corresponding eigenvectors. 14

Then
• The eigenvectors are the Principal Components
• λ a scalar known as the eigenvalue
• a a vector known as the eigenvector

a is not unique. There are an infinite number of possibilities, so:
• Choose a such that the sum of the squares of coefficients for each eigenvector is = 1.
• This yields: P unique eigenvalues and P corresponding eigenvectors.
15

Review
• Principal Components are the eigenvectors, • and their variances are the eigenvalues
• of the covariance matrix S of the X’s
• Variances of the Cj ‘s add to the sum of the variances of the original variables (total variances)
16

Example Revisited (p 361) • VarX1 =104.0 VarX2 =53.5 sum =157.5
• VarC1 =135.0 VarC2 =22.5 sum =157.5 • Total Variance is Preserved
17

Work thru one of these by hand
18

Analyzing Correlation Matrix
• Usually standardize:
• Transform X Z = X/SD(X)
• Var (Zi ) = 1, Covar (Zi , Zj ) = rij
• Covariance matrix (S) => correlation matrix (R)
• S and R give different Principal Components
19

Example Revisited
Eigenvalue
Covariance Matrix S
Correlation Matrix R
λ1
135.0 (85.7%)
1.68 (83.8%)
λ2
22.5 (14.3%)
0.32 (16.2%)
Total
157.5 (100%)
2 (100%)
NOTE: Different % variance explained by Principal Components
20

Meaning of aij
• Analyzing S:
• Correlation (Ci , Xj ) = aij * λi 1⁄2 / SD (Xj )
• Analyzing R:
• Correlation (Ci , Xj ) = aij * λi 1⁄2
• Correlations are a guide to interpreting Principal Components
21

Dimension Reduction
• Retain the first m principal components as representatives of the original P variables
• Example: keep P C1 as summary of X1,X2
• m= 1
• Choose m large enough to explain a “large”
percentage of the original total variance
22

Choosing m
• Rely on existing theory
• Kaiser’s Rule:
• S: choose λi > 𝐬𝐮𝐦 𝐨𝐟 𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞 of X’s/P • R:chooseλi >𝟏
• We need to explain a given %
• Elbow Rule
23

Cumulative Percentages of Total Variance for Depression Data (p 368)
24

Eigenvalues for Depression Data – Scree Plot (p 368)
A plot, in descending order of magnitude, of the eigenvalues of a correlation matrix
25

Depression, CESD, Example
• Using R (Correlation, Not Language): • Choose Eigenvalues > 𝟏
• Hence choose 5 Principal Components
26

Elbow Rule for Choosing m
• Start with the scree plot
• Choose a cutoff point where:
• Lines joining consecutive points are “steep” left of the cutoff point, and
• “flat” right of the cutoff point
• The point where the two slopes meet is the cutoff
point
27

Eigenvalues for Depression Data – Scree Plot (p 368)
M= 2
28

Principal Components for standardized CESD scale items (p 369)
29

Reading the Output
• Here: X1 = “I felt that I could not shake …” X2 = “I felt depressed, …”
• The Principal Component are:
C1 = 0.2774X1 + 0.3132X2 + …
C2 = 0.1450X1 + 0.0271X2 + … etc.
30

Coefficients as Correlations
• Recall: Correlation (Ci , Xj ) = aij * λi 1⁄2
• Choose X’s where Correlation > 0.5
• Example:
• For C1 , λ1 = 7.055; λ1 1⁄2 = 2.656
• Correlation > 0.5 => a1j > 0.5/2.656 = 0.188
• Similarly for other Principal Components
31

Interpretations of Principal Components
• Loadings with r > 0.5 are underlined
• C1 : a weighted average of most items.
• High C1 => respondent had many symptoms of
depression. Note sign of loadings.
• C2 : lethargy (high => energetic)
• C3 : friendliness of others
• C4 : self-worth / appetite
• C5 : hopefulness
32

Use in Multiple Regression
• Discard the last few Principal Components, and perform regression of the remaining. This leads to more stable regression estimates.
• Alternative to variable selection
• For example: several measures of behavior
• Use PC1 or PC1 and PC2 as summary
measures of all.
33

Multicollinearity
• The size of the variance of the last few Principal Components is useful as an indicator of multicollinearity among the original variables.
• For example:
• If PC10 = .5X – .2Y – .25Z, or λi = 0.01 ≈ 0 then • X≈.4Y+.5Z
• Therefore discard X
34

A more realistic example Depression
35

Summary of Computer Output
36

Caveats
• Principal Components derived from standardized variables differ from those derived from original variables.
• Interpretation is easier if data arise from, or are transformed to have, a symmetric distribution
• It is important that measurements are accurate, especially for detection for collinearity
37

A Bit of Review
• Preparation of Data: Outliers, Null Values, etc.
• Regression – Simple and Multiple
• Discriminant Analysis
• Logistic Regression
• Dimension Reduction
• Principal Component Analysis
• Factor Analysis
• Cluster Analysis – Hierarchical, K-Means, Density
38