DIMENSION
REDUCTION Applied Analytics: Frameworks and Methods 2
1
Outline
■ Motivation for Dimension Reduction
■ Types and Relationship to other Techniques
■ Exploratory Factor Analysis
■ Principal Components Analysis
2
Why Reduce Dimensions?
■ When faced with a large set of correlated variables, dimension reduction allows us to summarize this set with a smaller number of representative variables that collectively explain most of the variability in the original set
■ Parsimony is a desirable property for models
– Predictions from simple (versus complex models) are more stable across samples
– Simple models are easier to interpret and communicate to stakeholders.
■ As number of predictors increases, the chance of finding correlations among a predictor or a set of predictors increases. Such correlations among predictors called multicollinearity inflates standard errors of coefficients, potentially leading to erroneous conclusions about the relevance of a predictor.
■ In cases where p > n, traditional estimation techniques will not work.
3
Dimension Reduction vs. Clustering
■ Like clustering, dimension reduction techniques are unsupervised learning approaches
■ In contrast to clustering, dimension reduction seeks to group variables rather than observations.
4
Process
■ Dimension reduction works by grouping similar variables together.
– To better understand the process of dimension reduction, think of spinning a test tube containing muddy water in a centrifuge. Similar particles are grouped together in layers, each layer represents a factor or component of the original data.
■ Dimension reduction involves loss of information
– Representing a large number of variables using a small number of factors or components often results in loss of information. However, this is okay so long as the information loss is not substantial. To better understand this idea, think of .jpg or .gif image files used on websites. These are compact image files that look just like the original .bmp or .tiff image files they are derived from. It is only at high resolution that the pixilation of a .gif or .jpg is apparent.
5
Approaches based on Purpose
■ Determine underlying themes
– Identify underlying dimensions or factors that explain the correlations among a set of variables. This is frequently used in the social sciences to identify latent variables or constructs underlying measured variables.
– Technique: Factor Analysis
■ Represent data with fewer components
– To identify a set of uncorrelated variables to replace the original set of correlated variables. This may be done to use the variables in a supervised learning technique that is sensitive to multicollinearity.
– Technique: Principal Components Analysis
■ Note: The distinction is not as cut and dry as indicated by the above classification. Factor analysis may also be used to reduce data while Principal Components Analysis may shed light on underlying meaning
6
EXPLORATORY FACTOR ANALYSIS
7
Purpose of Factor Analysis
■ A fundamental assumption of factor analysis is that measured data represents deeper latent factors.
■ It aims to find deeper latent factors underlying observed data
■ Or, to confirm relationships between observed data and latent factors.
– For e.g., if the questions included in the survey represent the constructs they were designed to measure. We will now pursue this idea using some survey data.
8
Expected Structure
■ Unlike many supervised learning techniques which generate a unique solution, factor analysis generates many good solutions.
■ Factor analysis assumes the existence of a theoretical mapping between the factor and variables.
■ Thus, before conducting a factor analysis, it is important to formulate an expectation of the latent factors underlying the data and their relationship with the observed variables.
9
Toothpaste Survey
Consider a survey on the benefits people seek from toothpaste
10
Toothpaste Survey
Can you identify any themes (latent factors) underlying these survey questions?
It is important to buy a toothpaste that prevents cavities I like a toothpaste that gives shiny teeth
A toothpaste should strengthen your gums
I prefer a toothpaste that freshens breath
Prevention of tooth decay is not an important benefit offered by a toothpaste The most important consideration in buying a toothpaste is attractive teeth
11
Toothpaste Survey
■ The toothpaste survey was conducted on a small sample (n=30)
■ Responses gathered are saved in a text file for use in factor analysis.
12
Steps in Factor Analysis
1. Suitability for Factor Analysis
– Correlations
– Bartlett’s Test of Sphericity
– KMO MSA
2. Determine number of factors
– Scree Plot
– Eigen-value
– Parallel Analysis
– Total variance explained
– Extracted Communalities
3. Mapping Variables to Factors
4. Interpretation
5. Representing the factor
13
Suitability for Factor Analysis
■ Since factor analysis attempts to group similar variables, a basic requirement is that at least a few variables be related to each other.
■ Correlation Matrix
– A correlation matrix with some large and some small correlations is ideal for factor analysis.
– If all correlations are small, there is no way to group variables. On the other hand, if all correlations are large, then all the variables will load onto the same factor.
■ Bartlett’s Test of Sphericity
– Looks to see if there are at least some non-zero correlations by comparing correlation matrix to an
identity matrix. A significant test indicates suitability for factor analysis.
■ KMO’s Measure of Sampling Adequacy (MSA)
– Compares partial correlation matrix to pairwise correlation matrix. A partial correlation is a correlation after partialling out all other correlations. If the variables are strongly related, partial correlations should be small and MSA close to 1. If MSA > 0.5, data is suitable for factor analysis.
14
Determine Number of Factors
■ It is critical to have an a priori expectation of the number of factors.
■ Following data-driven methods must be used to corroborate an a priori solution or select a set of priori solutions
– Scree Plot
■ Line graph of eigen values for each factor. Ideal number of factors is indicated by a
sudden change in the line graph or what is known as the elbow. – Eigen-value
■ Select all factors with eigen value greater than 1 – Parallel Analysis
■ Simulate a dataset with same variables and observations as original dataset. Compute correlation matrix and eigen values. Now, compare eigen values from simulated data to original data. Select factors with eigen values in the original data greater than eigen values in the simulated data.
15
Determine Number of Factors
– Total Variance Explained
■ To ensure that the factors represents the original variables sufficiently well, the
total variance explained by factors should be greater than 70%. – Extracted Communalities
■ Communality reflects the amount of variance in a variable that can be explained by the factors.
■ Larger the communality, the more of the variable is captured by the factor solution.
■ On the other hand, a small communality implies most of the variance in the variable was not captured. Ideally, communality of each variable must be greater than 0.7, but a communality greater than 0.5 may be seen as acceptable.
16
Mapping Variables to Factors
■ Each variable is represented as a linear combination of factors
■ An ideal factor solution is where each variable is expected to load on (i.e., related to) only one factor. Such a result is easy to interpret.
■ In practice, each variable may load on many factors. This may still be acceptable so long as the loading on one factor is large and on all other factors is small.
■ When the pattern of loadings does not show a clear preference of a variable for a factor, rotating the axes may help generate a clear mapping. There are two broad types of axes rotation
– Orthogonal: Axes are rotated while constraining them to be at right angles. E.g., varimax, quartimax, equimax,
– Oblique: Axes are allowed to have any angle between them. E.g., oblimin, promax
17
Interpretation
■ Review pattern of factor loadings from the rotated matrix to interpret the factor analysis solution
■ The meaning of each factor is derived from the variables loading on it.
■ Review the variables to describe each factor.
18
Representing the Factor
■ If the goal is to use the factors in further analysis, then they may be represented in one of three ways
– Average scores of variables reflecting the factor
– Weighted average of variables reflecting the factor, where weights are the
factor loadings
– Pick a variable as a representative of the factor
19
PRINCIPAL COMPONENTS ANALYSIS
20
Principal Components Analysis
■ Used to reduce the dimensionality of the data by representing a large number of variables with a fewer number of components. Similar variables get grouped into the same component while dissimilar variables are placed in different components.
■ Like factor analysis, it reduces the data based on similarity (typically inferred from correlation).
■ It is used when the primary concern is to determine the minimum number of factors that will account for maximum variance in the data for use in subsequent multivariate analysis. The factors are called principal components.
■ This reduced number of components can be used for further analysis instead of the original set of variables.
21
Principal Components Analysis ■ Curse of dimensionality
22
Principal Components Analysis vs. Factor Analysis
■ Factor analysis derives a mathematical model from which factors are estimated, whereas principal components analysis merely decomposes the original data into a set of linear variates.
■ Factor analysis decomposes shared variance while principal components analysis decomposes total variance
■ In theory, only Factor Analysis can estimate the underlying factors
■ In practice, the solutions generated from principal components analysis and factor analysis differ little.
23
Steps in Principal Components Analysis
1. Prepare Data
– Impute missing values
– Standardize Variables
– Split into Train and Test
– Exclude variables not relevant to analysis
2. Suitability for Principal Components Analysis
– Correlations
– Bartlett’s Test of Sphericity
– KMO MSA
3. Determine number of components
– Scree Plot
– Eigen-value
– Parallel Analysis
– Total variance explained
4. Describing Components
5. Apply Component Structure to Test Set
24
Prepare Data ■ Impute missing values
– Principal Components Analysis uses data on all variables. A missing observation on one variable will cause the entire row of data to be ignored for analysis. Therefore, it is important to impute missing values.
■ Standardize Variables
– To ensure all variables to receive the same weight in analysis.
■ Split into Train and Test
– Determine best component structure using train set
– Apply component structure from Train set to generate components in Test
■ Exclude variables not relevant to analysis
25
Suitability for Principal Components Analysis
■ Since Principal Components Analysis analysis attempts to group similar variables, a basic requirement is that at least a few variables be related to each other.
■ Correlation Matrix
– A correlation matrix with some large and some small correlations is ideal for principal
components analysis.
■ Bartlett’s Test of Sphericity
– Looks to see if there are at least some non-zero correlations by comparing correlation matrix
to an identity matrix. A significant test indicates suitability for principal components analysis.
■ KMO’s Measure of Sampling Adequacy (MSA)
– Compares partial correlation matrix to pairwise correlation matrix. A partial correlation is a correlation after partialling out all other correlations. If the variables are strongly related, partial correlations should be small and MSA close to 1. If MSA > 0.5, data is suitable for principal components analysis.
26
Determine number of components
A dataset with p variables will generate p components. The goal is to pick out the top few components that capture most of the variance in the original data. This is done based on the following criteria.
■ Scree Plot
– Line graph of eigen values for each component. Ideal number of factors is indicated by a sudden change in the
■ Eigen-value
– Select all components with eigen value greater than 1
■ Parallel Analysis
– Simulate a dataset with same variables and observations as original dataset. Compute correlation matrix and eigen values. Now, compare eigen values from simulated data to original data. Select factors with eigen values in the original data greater than eigen values in the simulated data.
■ Total Variance Explained
– To ensure that the components represents the original variables sufficiently well, the total variance explained by components should be greater than 70%.
line graph or what is known as the elbow.
27
Describe Components
■ Examining elements comprising each component.
■ Examine relationships amongst variables and between variables and components
28
Apply Component Structure
■ In order to use the components, for downstream analysis,
– apply the component structure to the test set.
– extract the components
– combine components with other variables in the original dataset
29
Conclusion
■ In this session, we
– discussed the motivation for conducting dimension reduction
– reviewed dimension reduction techniques
– examined exploratory factor analysis
– examined principal components analysis
30