程序代写代做 C go BIA 652 Factor Analysis

BIA 652 Factor Analysis
Practical Multivariate Analysis, Afifi, et. al. Chapter 15

Overview of Class 12
• Recall Dimension Reduction and PCA – Chapter 14
• Factor Analysis – Chapter 15
• Ethics Cases
• HW (Hand In):15.1,2:
• Class 12 – Further discussion of Project & Test
• Project outline on CANVAS
• Dates: Test (in class) – 4/27; Oral Project – 5/4 & 5/11; Written Project – Monday following Oral.
• http://psych.colorado.edu/~carey/Courses/PSYC7291/ClassDa taSets.htm
• Wolves, Twins, UScrime
• Midges on CANVAS
2

Factor Analysis
(A special case of Structural Equation Modeling)
3

Goals
• Generalization of Principal Components Analysis
• Explain interrelationships among a set of
variables.
• Select a small number of factors to convey
essential information
• Perform additional analysis to improve
interpretation
• EFA vs CFA: http://www2.sas.com/proceedings/sugi31/200-31.pdf
4

Factor Model
• Start with P standardized variables
• Express each variable as a linear combination of
m common factors plus a unique factor • m << P. Ideally m is known in advance 5 Examples • Fifty test scores • Each is a function of m = 3 factors • Verbal, quantitative, analytical skills • CESD items • Each response is a function of some factors of depression 6 Model Equations X1 = l11 F1 + l12 F2 + ... + l1m Fm + e1 X2=l21F1 +l22F2 + ... + l2mFm + e2 .. . Xp=lp1F1 +lp2F2 + ... + lpmFm + ep 7 Terms Xi= lijFj +ei Fj = Common or latent factors ei = Unique factors lij = Coefficients of common factors = Factor Loadings 8 Implications • Variance of any original (X) variable is composed of: • Communality: part due to common factors, and • Specificity: part due to unique factor • Variance Xi = V (Xi ) = communality + specificity •=hi2 +ui2 • = 1 when X’s are standardized 9 Assumptions • Each V (Fj ) = 1 • Fj ‘s are uncorrelated • Fj ‘s and ei ‘s are uncorrelated 10 Steps on Factor Analysis • Initial factor extraction: • Estimate the loadings and communalities • Factor “rotations” to improve interpretation 11 Example 100 data points generated from five variables with multivariate normal distribution. 12 Example data model (known) X1 = 1 * F1 X2 = 1 * F1 X3 = 0 * F1 X4 = 0 * F1 X5 = 0 * F1 + 0 * F2 + e1 + 0 * F2 + e2 + 0.5 * F2 + e3 + 1.5 * F2 + e4 + 2 * F2 + e5 13 Example - Implications • F1 , F2 and all ei ‘s are independent, normal variables • Therefore: the first 2 X’s are inter-correlated, and the last 3 X’s are inter- correlated • And: The first 2 X’s are not correlated with the last 3 X’s 14 Means and Correlations Means: 0.163, 0.142, 0.098, -0.039, -0.013 Correlation Matrix 1.0 0.757 1.0 0.047 0.054 1.0 0.115 0.176 0.531 1.0 0.279 0.322 0.521 0.942 1.0 15 Steps on Factor Analysis • Initial factor extraction: • Estimate the loadings and communalities • Factor “rotations” to improve interpretation 16 Initial Factor Extraction 17 Initial Factor Extraction • Principal Component Factor Model • Iterated Principal Component Factor Model • Maximum Likelihood Model 18 Principal Component Factor Model • Recall – Principal Component Model: • C = AX, PC’s are Functions of X’s • We want: X’s = Functions of F’s 19 Basic Idea • X1 , X2 – correlated Transform them into: • C1 , C2 – uncorrelated 20 Inverse Model • If C = 5X, then X = (1/5)C, or X = 5-1 C • The Inverse of Principal Components Model is X = A-1 C • In this case, A is an orthogonal matrix, -1 T Therefore: A = A and X1 = a11 C1 + a21 C2 + ... + ap1 Cp .. . Xm = am1 C1 + am2 C2 + ... + amp Cp 21 PC Factor Model Derivation Xi = Xi = Xi= Fj = Common or latent factors ei = Unique factors aji Cj aji Cj+ aji Cj (1,mandm,p) ljiFj +ei lij = Coefficients of common factors = Factor Loadings 22 Interpretation • Var (C j ) = λj NOT 1 • Transform: Fj = Cj / λj 1⁄2 • Therefore: Var (Fj ) = 1 • And loadings are: lij = (λj 1⁄2 )(a ji ) 23 Interpretation lij is the correlation coefficient between variable i and variable j 24 In Previous Example • Variances of the principal components are: 2.578, 1.567, 0.571, 0.241, 0.043 • Select m = 2 factors 25 Previous Example PC Method (p 386) 26 Reading the Output (p 385) • The factor model: X1 = 0.511F1 + 0.782F2 + ei X2 = 0.553F1 + 0.754F2 + e2 ... • Communality: For X1 : h1 2 = 0.873 ForX2 :h2 2 =0.875 • Specificity = 1 - Communality 27 Implications • 1st row: h1 2 = 0.87 = .512 + .782 , etc. • 1st column: Variance Explained = 2.58 = .512 + .552 + ... , etc. • Variance Explained = eigenvalue • Variance Explained = hi 2 = total variance explained by common factors = 4.145 = 83% of total variance. 28 • • Initial Factor Extraction Method 2 Iterated Principal Factors (IPF) Select common factors to maximize the total communality Use iterative procedure: 1. Getinitialcommunalityestimates 2. Use these (instead of original variances) to get Principal Components and factor loadings 3. Getnewcommunalityestimates 4. If appreciable change, go to step 2, 5. Else, stop. 29 Example: IPF Method (p 388) 30 Comparison: PCF vs IFE 31 Steps on Factor Analysis • Initial factor extraction: • Estimate the loadings and communalities • Factor “rotations” to improve interpretation 32 Factor Rotations 33 Factor Rotations • Find new factors that are easier to interpret: • For each X, some high factors, and some low. • Varimax orthogonal rotation: Maximize Var (lij 2 | Fj ) therefore vary lij within each factor • Quartimax orthogonal rotation: Maximize Var (all lij 2 ) therefore vary all lij 34 Factor Diagram for Principal Component Extraction Method (p 387) 35 Orthogonal Rotation for Principal Component Factor Diagram(p 391) 36 Factor Diagram for IPF Extraction Method (p 361) 37 Orthogonal Rotation for IPF Factor Diagram (p 392) 38 Comparison 39 Varimax for Principal Components (p 393) 40 Varimax for IPF (p 393) 41 Oblique Rotations • No longer require Orthogonality • Most common: direct quartimin method 42 Direct Quartimin Oblique Rotation for Principal Component Factor Diagram (p 394) 43 Direct Quartimin Oblique Rotation for IPF Factor Diagram (p 395) 44 Comparison 45 Comparison Orthogonal vs Oblique Rotations Advantages Disadvantages Orthogonal • Factors Independent • Communalities Preserved • Interpretation slightly less clear Oblique • Better Interpretation • Factors are Correlated • Communalities Change 46 Example USCrime PCA, Factor, Cluster 47 Example CESD See Page 398 48 Interpretation Principal Component Analysis identified 5 PCs, but previous literature used 4 factors, not 5: • Factor 1: loads heavily on items 1 – 7 • Factor 2: items 12 – 18 (somatic and retarded activity) • Factor 3: items 19 – 20 (interpersonal), plus items 9 and 11 (positive-affect) • Factor 4: no clear pattern: item 8 (positive affect) has highest loading 49 Factor Scores 50 Factor Scores • FA: each X = function of F’s • Express each F = function of X’s. 51 Computing Factor Scores • Recall Multiple Linear Regression: Y = A + B1 X 1 + B2 X 2 + .... • B=Sxx-1Syx • In Factor Analysis, target is: F = A + B1 X 1 + B2 X 2 + .... • B = Sxx -1 SFx • Then Sxx and SFx is the column of Factor Loadings 52 Uses of Principal Component and Factor Scores • 1st Principal Component Score can summarize several variables. • Can be used as dependent or independent variable in other analyses • Factor scores can be used as dependent or independent variables in other analyses 53 Caveats • Number of Factors should be chosen with care – check default options • There should be at least two variables with non-zero weights per factor • If Factors are correlated, try Oblique Factor Analysis • Results usually evaluated by “reasonableness to investigator” as opposed to formal tests • Motivate theory, not replace it. 54 A Bit of Review • Preparation of Data: Outliers, Null Values, etc. • Regression – Simple and Multiple • Discriminant Analysis • Logistic Regression • Cluster Analysis – Hierarchical, K-Means • Dimension Reduction • Principal Component Analysis • Factor Analysis 55 A more realistic example Depression – Using SAS 56