程序代写代做代考 DNA Excel data structure algorithm How To: Use the psych package for Factor Analysis and data

How To: Use the psych package for Factor Analysis and data

reduction

William Revelle
Department of Psychology
Northwestern University

November 20, 2016

Contents

1 Overview of this and related documents 3
1.1 Jump starting the psych package–a guide for the impatient . . . . . . . . . 3

2 Overview of this and related documents 5

3 Getting started 5

4 Basic data analysis 6
4.1 Data input from the clipboard . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Basic descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Simple descriptive graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.3.1 Scatter Plot Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.2 Correlational structure . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.3 Heatmap displays of correlational structure . . . . . . . . . . . . . . 10

4.4 Polychoric, tetrachoric, polyserial, and biserial correlations . . . . . . . . . . 13

5 Item and scale analysis 13
5.1 Dimension reduction through factor analysis and cluster analysis . . . . . . 13

5.1.1 Minimum Residual Factor Analysis . . . . . . . . . . . . . . . . . . . 15
5.1.2 Principal Axis Factor Analysis . . . . . . . . . . . . . . . . . . . . . 16
5.1.3 Weighted Least Squares Factor Analysis . . . . . . . . . . . . . . . . 16
5.1.4 Principal Components analysis (PCA) . . . . . . . . . . . . . . . . . 22
5.1.5 Hierarchical and bi-factor solutions . . . . . . . . . . . . . . . . . . . 22
5.1.6 Item Cluster Analysis: iclust . . . . . . . . . . . . . . . . . . . . . . 26

5.2 Confidence intervals using bootstrapping techniques . . . . . . . . . . . . . 29

1

5.3 Comparing factor/component/cluster solutions . . . . . . . . . . . . . . . . 29
5.4 Determining the number of dimensions to extract. . . . . . . . . . . . . . . 35

5.4.1 Very Simple Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4.2 Parallel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.5 Factor extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Classical Test Theory and Reliability 40
6.1 Reliability of a single scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Using omega to find the reliability of a single scale . . . . . . . . . . . . . . 46
6.3 Estimating ωh using Confirmatory Factor Analysis . . . . . . . . . . . . . . 50

6.3.1 Other estimates of reliability . . . . . . . . . . . . . . . . . . . . . . 52
6.4 Reliability and correlations of multiple scales within an inventory . . . . . . 52

6.4.1 Scoring from raw data . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.4.2 Forming scales from a correlation matrix . . . . . . . . . . . . . . . . 55

6.5 Scoring Multiple Choice Items . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.6 Item analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Item Response Theory analysis 58
7.1 Factor analysis and Item Response Theory . . . . . . . . . . . . . . . . . . . 60
7.2 Speeding up analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3 IRT based scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

8 Multilevel modeling 68
8.1 Decomposing data into within and between level correlations using statsBy 68
8.2 Generating and displaying multilevel data . . . . . . . . . . . . . . . . . . . 70

9 Set Correlation and Multiple Regression from the correlation matrix 70

10 Simulation functions 73

11 Graphical Displays 75

12 Miscellaneous functions 77

13 Data sets 78

14 Development version and a users guide 79

15 Psychometric Theory 80

16 SessionInfo 80

2

1 Overview of this and related documents

To do basic and advanced personality and psychological research using R is not as compli-
cated as some think. This is one of a set of “How To” to do various things using R (R Core
Team, 2016), particularly using the psych (Revelle, 2016) package.

The current list of How To’s includes:

1. Installing R and some useful packages

2. Using R and the psych package to find omegah and ωt .

3. Using R and the psych for factor analysis and principal components analysis. (This
document).

4. Using the score.items function to find scale scores and scale statistics.

5. An overview (vignette) of the psych package

1.1 Jump starting the psych package–a guide for the impatient

You have installed psych (section 3) and you want to use it without reading much more.
What should you do?

1. Activate the psych package:
R code

library(psych)

2. Input your data (section 4.1). Go to your friendly text editor or data manipulation
program (e.g., Excel) and copy the data to the clipboard. Include a first line that has
the variable labels. Paste it into psych using the read.clipboard.tab command:

R code
myData <- read.clipboard.tab() 3. Make sure that what you just read is right. Describe it (section 4.2) and perhaps look at the first and last few lines: R code describe(myData) headTail(myData) 4. Look at the patterns in the data. If you have fewer than about 10 variables, look at the SPLOM (Scatter Plot Matrix) of the data using pairs.panels (section 4.3.1). R code pairs.panels(myData) 5. Find the correlations of all of your data. 3 http://personality-project.org/r/psych/HowTo/getting_started.pdf http://personality-project.org/r/psych/HowTo/omega.pdf http://personality-project.org/r/psych/HowTo/factor.pdf http://personality-project.org/r/psych/HowTo/scoring.pdf http://personality-project.org/r/psych/overview.pdf • Descriptively (just the values) (section 4.3.2) R code lowerCor(myData) • Graphically (section 4.3.3) R code corPlot(r) 6. Test for the number of factors in your data using parallel analysis (fa.parallel, section 5.4.2) or Very Simple Structure (vss, 5.4.1) . R code fa.parallel(myData) vss(myData) 7. Factor analyze (see section 5.1) the data with a specified number of factors (the default is 1), the default method is minimum residual, the default rotation for more than one factor is oblimin. There are many more possibilities (see sections 5.1.1-5.1.3). Compare the solution to a hierarchical cluster analysis using the ICLUST algorithm (Revelle, 1979) (see section 5.1.6). Also consider a hierarchical factor solution to find coefficient ω (see 5.1.5). R code fa(myData) iclust(myData) omega(myData) 8. Some people like to find coefficient α as an estimate of reliability. This may be done for a single scale using the alpha function (see 6.1). Perhaps more useful is the ability to create several scales as unweighted averages of specified items using the scoreIems function (see 6.4) and to find various estimates of internal consistency for these scales, find their intercorrelations, and find scores for all the subjects. R code alpha(myData) #score all of the items as part of one scale. myKeys <- make.keys(nvar=20,list(first = c(1,-3,5,-7,8:10),second=c(2,4,-6,11:15,-16))) my.scores <- scoreItems(myKeys,myData) #form several scales my.scores #show the highlights of the results At this point you have had a chance to see the highlights of the psych package and to do some basic (and advanced) data analysis. You might find reading the entire overview vignette helpful to get a broader understanding of what can be done in R using the psych. Remember that the help command (?) is available for every function. Try running the examples for each help page. 4 http://personality-project.org/r/psych/overview.pdf 2 Overview of this and related documents The psych package (Revelle, 2016) has been developed at Northwestern University since 2005 to include functions most useful for personality, psychometric, and psychological re- search. The package is also meant to supplement a text on psychometric theory (Revelle, prep), a draft of which is available at http://personality-project.org/r/book/. Some of the functions (e.g., read.clipboard, describe, pairs.panels, scatter.hist, error.bars, multi.hist, bi.bars) are useful for basic data entry and descriptive analy- ses. Psychometric applications emphasize techniques for dimension reduction including factor analysis, cluster analysis, and principal components analysis. The fa function includes five methods of factor analysis (minimum residual , principal axis, weighted least squares, generalized least squares and maximum likelihood factor analysis). Determining the num- ber of factors or components to extract may be done by using the Very Simple Structure (Revelle and Rocklin, 1979) (vss), Minimum Average Partial correlation (Velicer, 1976) (MAP) or parallel analysis (fa.parallel) criteria. Item Response Theory (IRT) models for dichotomous or polytomous items may be found by factoring tetrachoric or polychoric correlation matrices and expressing the resulting parameters in terms of location and dis- crimination using irt.fa. Bifactor and hierarchical factor structures may be estimated by using Schmid Leiman transformations (Schmid and Leiman, 1957) (schmid) to transform a hierarchical factor structure into a bifactor solution (Holzinger and Swineford, 1937). Scale construction can be done using the Item Cluster Analysis (Revelle, 1979) (iclust) function to determine the structure and to calculate reliability coefficients α (Cronbach, 1951)(alpha, scoreItems, score.multiple.choice), β (Revelle, 1979; Revelle and Zin- barg, 2009) (iclust) and McDonald’s ωh and ωt (McDonald, 1999) (omega). Guttman’s six estimates of internal consistency reliability (Guttman (1945), as well as additional estimates (Revelle and Zinbarg, 2009) are in the guttman function. The six measures of Intraclass correlation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available. Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairs.panels, corre- lation “heat maps” (cor.plot) factor, cluster, and structural diagrams using fa.diagram, iclust.diagram, structure.diagram, as well as item response characteristics and item and test information characteristic curves plot.irt and plot.poly. 3 Getting started Some of the functions described in this overview require other packages. Particularly useful for rotating the results of factor analyses (from e.g., fa or principal) or hierarchical factor models using omega or schmid, is the GPArotation package. These and other useful 5 http://personality-project.org/r/book/ packages may be installed by first installing and then using the task views (ctv) package to install the “Psychometrics” task view, but doing it this way is not necessary. 4 Basic data analysis A number of psych functions facilitate the entry of data and finding basic descriptive statistics. Remember, to run any of the psych functions, it is necessary to make the package active by using the library command: R code library(psych) The other packages, once installed, will be called automatically by psych. It is possible to automatically load psych and other functions by creating and then saving a “.First” function: e.g., R code .First <- function(x) {library(psych)} 4.1 Data input from the clipboard There are of course many ways to enter data into R. Reading from a local file using read.table is perhaps the most preferred. However, many users will enter their data in a text editor or spreadsheet program and then want to copy and paste into R. This may be done by using read.table and specifying the input file as “clipboard” (PCs) or “pipe(pbpaste)” (Macs). Alternatively, the read.clipboard set of functions are perhaps more user friendly: read.clipboard is the base function for reading data from the clipboard. read.clipboard.csv for reading text that is comma delimited. read.clipboard.tab for reading text that is tab delimited (e.g., copied directly from an Excel file). read.clipboard.lower for reading input of a lower triangular matrix with or without a diagonal. The resulting object is a square matrix. read.clipboard.upper for reading input of an upper triangular matrix. read.clipboard.fwf for reading in fixed width fields (some very old data sets) 6 For example, given a data set copied to the clipboard from a spreadsheet, just enter the command R code my.data <- read.clipboard() This will work if every data field has a value and even missing data are given some values (e.g., NA or -999). If the data were entered in a spreadsheet and the missing values were just empty cells, then the data should be read in as a tab delimited or by using the read.clipboard.tab function. R code my.data <- read.clipboard(sep="\t") #define the tab option, or my.tab.data <- read.clipboard.tab() #just use the alternative function For the case of data in fixed width fields (some old data sets tend to have this format), copy to the clipboard and then specify the width of each field (in the example below, the first variable is 5 columns, the second is 2 columns, the next 5 are 1 column the last 4 are 3 columns). R code my.data <- read.clipboard.fwf(widths=c(5,2,rep(1,5),rep(3,4)) 4.2 Basic descriptive statistics Once the data are read in, then describe will provide basic descriptive statistics arranged in a data frame format. Consider the data set sat.act which includes data from 700 web based participants on 3 demographic variables and 3 ability measures. describe reports means, standard deviations, medians, min, max, range, skew, kurtosis and standard errors for integer or real data. Non-numeric data, although the statistics are meaningless, will be treated as if numeric (based upon the categorical coding of the data), and will be flagged with an *. It is very important to describe your data before you continue on doing more complicated multivariate statistics. The problem of outliers and bad data can not be overempha- sized. > library(psych)
> data(sat.act)
> describe(sat.act) #basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 700 1.65 0.48 2 1.68 0.00 1 2 1 -0.61 -1.62 0.02
education 2 700 3.16 1.43 3 3.31 1.48 0 5 5 -0.68 -0.07 0.05
age 3 700 25.59 9.50 22 23.86 5.93 13 65 52 1.64 2.42 0.36
ACT 4 700 28.55 4.82 29 28.84 4.45 3 36 33 -0.66 0.53 0.18
SATV 5 700 612.23 112.90 620 619.45 118.61 200 800 600 -0.64 0.33 4.27
SATQ 6 687 610.22 115.64 620 617.25 118.61 200 800 600 -0.59 -0.02 4.41

7

4.3 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well as
communicating important results. Scatter Plot Matrices (SPLOMS) using the pairs.panels
function are useful ways to look for strange effects involving outliers and non-linearities.
error.bars.by will show group means with 95% confidence boundaries.

4.3.1 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data. The pairs.panels
function, adapted from the help menu for the pairs function produces xy scatter plots of
each pair of variables below the diagonal, shows the histogram of each variable on the
diagonal, and shows the lowess locally fit regression line as well. An ellipse around the
mean with the axis length reflecting one standard deviation of the x and y variables is also
drawn. The x axis in each scatter plot represents the column variable, the y axis the row
variable (Figure 1). When plotting many subjects, it is both faster and cleaner to set the
plot character (pch) to be ’.’. (See Figure 1 for an example.)

pairs.panels will show the pairwise scatter plots of all the variables as well as his-
tograms, locally smoothed regressions, and the Pearson correlation. When plotting
many data points (as in the case of the sat.act data, it is possible to specify that the
plot character is a period to get a somewhat cleaner graphic.

4.3.2 Correlational structure

There are many ways to display correlations. Tabular displays are probably the most
common. The output from the cor function in core R is a rectangular matrix. lowerMat
will round this to (2) digits and then display as a lower off diagonal matrix. lowerCor
calls cor with use=‘pairwise’, method=‘pearson’ as default values and returns (invisibly)
the full correlation matrix and displays the lower off diagonal matrix.

> lowerCor(sat.act)

gendr edctn age ACT SATV SATQ
gender 1.00
education 0.09 1.00
age -0.02 0.55 1.00
ACT -0.04 0.15 0.11 1.00
SATV -0.02 0.05 -0.04 0.56 1.00
SATQ -0.17 0.03 -0.03 0.59 0.64 1.00

When comparing results from two different groups, it is convenient to display them as one
matrix, with the results from one group below the diagonal, and the other group above the
diagonal. Use lowerUpper to do this:

8

> png( ‘pairspanels.png’ )
> pairs.panels(sat.act,pch=’.’)
> dev.off()
null device

1

Figure 1: Using the pairs.panels function to graphically show relationships. The x axis
in each scatter plot represents the column variable, the y axis the row variable. Note the
extreme outlier for the ACT. The plot character was set to a period (pch=’.’) in order to
make a cleaner graph.

9

> female <- subset(sat.act,sat.act$gender==2) > male <- subset(sat.act,sat.act$gender==1) > lower <- lowerCor(male[-1]) edctn age ACT SATV SATQ education 1.00 age 0.61 1.00 ACT 0.16 0.15 1.00 SATV 0.02 -0.06 0.61 1.00 SATQ 0.08 0.04 0.60 0.68 1.00 > upper <- lowerCor(female[-1]) edctn age ACT SATV SATQ education 1.00 age 0.52 1.00 ACT 0.16 0.08 1.00 SATV 0.07 -0.03 0.53 1.00 SATQ 0.03 -0.09 0.58 0.63 1.00 > both <- lowerUpper(lower,upper) > round(both,2)

education age ACT SATV SATQ
education NA 0.52 0.16 0.07 0.03
age 0.61 NA 0.08 -0.03 -0.09
ACT 0.16 0.15 NA 0.53 0.58
SATV 0.02 -0.06 0.61 NA 0.63
SATQ 0.08 0.04 0.60 0.68 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-
low the diagonal) and the difference of the second from the first above the diagonal:

> diffs <- lowerUpper(lower,upper,diff=TRUE) > round(diffs,2)

education age ACT SATV SATQ
education NA 0.09 0.00 -0.05 0.05
age 0.61 NA 0.07 -0.03 0.13
ACT 0.16 0.15 NA 0.08 0.02
SATV 0.02 -0.06 0.61 NA 0.05
SATQ 0.08 0.04 0.60 0.68 NA

4.3.3 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat map
of the correlations. This is just a matrix color coded to represent the magnitude of the
correlation. This is useful when considering the number of factors in a data set. Consider
the Thurstone data set which has a clear 3 factor solution (Figure 2) or a simulated data
set of 24 variables with a circumplex structure (Figure 3). The color coding represents a
“heat map” of the correlations, with darker shades of red representing stronger negative
and darker shades of blue stronger positive correlations. As an option, the value of the
correlation can be shown.

10

> png(‘corplot.png’)
> cor.plot(Thurstone,numbers=TRUE,main=”9 cognitive variables from Thurstone”)
> dev.off()
null device

1

Figure 2: The structure of correlation matrix can be seen more clearly if the variables are
grouped by factor and then the correlations are shown by color. By using the ’numbers’
option, the values are displayed as well.

11

> png(‘circplot.png’)
> circ <- sim.circ(24) > r.circ <- cor(circ) > cor.plot(r.circ,main=’24 variables in a circumplex’)
> dev.off()
null device

1

Figure 3: Using the cor.plot function to show the correlations in a circumplex. Correlations
are highest near the diagonal, diminish to zero further from the diagonal, and the increase
again towards the corners of the matrix. Circumplex structures are common in the study
of affect.

12

4.4 Polychoric, tetrachoric, polyserial, and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient. If the
data, e.g., ability items, are thought to represent an underlying continuous although latent
variable, the φ will underestimate the value of the Pearson applied to these latent variables.
One solution to this problem is to use the tetrachoric correlation which is based upon
the assumption of a bivariate normal distribution that has been cut at certain points. The
draw.tetra function demonstrates the process, A simple generalization of this to the case
of the multiple cuts is the polychoric correlation.

Other estimated correlations based upon the assumption of bivariate normality with cut
points include the biserial and polyserial correlation.

If the data are a mix of continuous, polytomous and dichotomous variables, the mixed.cor
function will calculate the appropriate mixture of Pearson, polychoric, tetrachoric, biserial,
and polyserial correlations.

The correlation matrix resulting from a number of tetrachoric or polychoric correlation
matrix sometimes will not be positive semi-definite. This will also happen if the correlation
matrix is formed by using pair-wise deletion of cases. The cor.smooth function will adjust
the smallest eigen values of the correlation matrix to make them positive, rescale all of
them to sum to the number of variables, and produce a “smoothed” correlation matrix. An
example of this problem is a data set of burt which probably had a typo in the original
correlation matrix. Smoothing the matrix corrects this problem.

5 Item and scale analysis

The main functions in the psych package are for analyzing the structure of items and of
scales and for finding various estimates of scale reliability. These may be considered as
problems of dimension reduction (e.g., factor analysis, cluster analysis, principal compo-
nents analysis) and of forming and estimating the reliability of the resulting composite
scales.

5.1 Dimension reduction through factor analysis and cluster analysis

Parsimony of description has been a goal of science since at least the famous dictum
commonly attributed to William of Ockham to not multiply entities beyond necessity1. The
goal for parsimony is seen in psychometrics as an attempt either to describe (components)

1Although probably neither original with Ockham nor directly stated by him (Thorburn, 1918), Ock-
ham’s razor remains a fundamental principal of science.

13

or to explain (factors) the relationships between many observed variables in terms of a
more limited set of components or latent factors.

The typical data matrix represents multiple items or scales usually thought to reflect fewer
underlying constructs2. At the most simple, a set of items can be be thought to represent
a random sample from one underlying domain or perhaps a small set of domains. The
question for the psychometrician is how many domains are represented and how well does
each item represent the domains. Solutions to this problem are examples of factor analysis
(FA), principal components analysis (PCA), and cluster analysis (CA). All of these pro-
cedures aim to reduce the complexity of the observed data. In the case of FA, the goal is
to identify fewer underlying constructs to explain the observed data. In the case of PCA,
the goal can be mere data reduction, but the interpretation of components is frequently
done in terms similar to those used when describing the latent variables estimated by FA.
Cluster analytic techniques, although usually used to partition the subject space rather
than the variable space, can also be used to group variables to reduce the complexity of
the data by forming fewer and more homogeneous sets of tests or items.

At the data level the data reduction problem may be solved as a Singular Value Decom-
position of the original matrix, although the more typical solution is to find either the
principal components or factors of the covariance or correlation matrices. Given the pat-
tern of regression weights from the variables to the components or from the factors to the
variables, it is then possible to find (for components) individual component or cluster scores
or estimate (for factors) factor scores.

Several of the functions in psych address the problem of data reduction.

fa incorporates five alternative algorithms: minres factor analysis, principal axis factor
analysis, weighted least squares factor analysis, generalized least squares factor anal-
ysis and maximum likelihood factor analysis. That is, it includes the functionality of
three other functions that will be eventually phased out.

principal Principal Components Analysis reports the largest n eigen vectors rescaled by
the square root of their eigen values.

factor.congruence The congruence between two factors is the cosine of the angle between
them. This is just the cross products of the loadings divided by the sum of the squared
loadings. This differs from the correlation coefficient in that the mean loading is not
subtracted before taking the products. factor.congruence will find the cosines
between two (or more) sets of factor loadings.

vss Very Simple Structure Revelle and Rocklin (1979) applies a goodness of fit test to
determine the optimal number of factors to extract. It can be thought of as a quasi-

2Cattell (1978) as well as MacCallum et al. (2007) argue that the data are the result of many more
factors than observed variables, but are willing to estimate the major underlying factors.

14

confirmatory model, in that it fits the very simple structure (all except the biggest c
loadings per item are set to zero where c is the level of complexity of the item) of a
factor pattern matrix to the original correlation matrix. For items where the model is
usually of complexity one, this is equivalent to making all except the largest loading
for each item 0. This is typically the solution that the user wants to interpret. The
analysis includes the MAP criterion of Velicer (1976) and a χ2 estimate.

fa.parallel The parallel factors technique compares the observed eigen values of a cor-
relation matrix with those from random data.

fa.plot will plot the loadings from a factor, principal components, or cluster analysis
(just a call to plot will suffice). If there are more than two factors, then a SPLOM
of the loadings is generated.

nfactors A number of different tests for the number of factors problem are run.

fa.diagram replaces fa.graph and will draw a path diagram representing the factor struc-
ture. It does not require Rgraphviz and thus is probably preferred.

fa.graph requires Rgraphviz and will draw a graphic representation of the factor struc-
ture. If factors are correlated, this will be represented as well.

iclust is meant to do item cluster analysis using a hierarchical clustering algorithm
specifically asking questions about the reliability of the clusters (Revelle, 1979). Clus-
ters are formed until either coefficient α Cronbach (1951) or β Revelle (1979) fail to
increase.

5.1.1 Minimum Residual Factor Analysis

The factor model is an approximation of a correlation matrix by a matrix of lower rank.
That is, can the correlation matrix, nRn be approximated by the product of a factor matrix,
nFk and its transpose plus a diagonal matrix of uniqueness.

R = FF ′+U2 (1)

The maximum likelihood solution to this equation is found by factanal in the stats pack-
age. Five alternatives are provided in psych, all of them are included in the fa function
and are called by specifying the factor method (e.g., fm=“minres”, fm=“pa”, fm=“”wls”,
fm=”gls” and fm=”ml”). In the discussion of the other algorithms, the calls shown are to
the fa function specifying the appropriate method.

factor.minres attempts to minimize the off diagonal residual correlation matrix by ad-
justing the eigen values of the original correlation matrix. This is similar to what is done

15

in factanal, but uses an ordinary least squares instead of a maximum likelihood fit func-
tion. The solutions tend to be more similar to the MLE solutions than are the factor.pa
solutions. min.res is the default for the fa function.

A classic data set, collected by Thurstone and Thurstone (1941) and then reanalyzed by
Bechtoldt (1961) and discussed by McDonald (1999), is a set of 9 cognitive variables with
a clear bi-factor structure Holzinger and Swineford (1937). The minimum residual solution
was transformed into an oblique solution using the default option on rotate which uses
an oblimin transformation (Table 1). Alternative rotations and transformations include
“none”, “varimax”, “quartimax”, “bentlerT”, and “geominT” (all of which are orthogonal
rotations). as well as “promax”, “oblimin”, “simplimax”, “bentlerQ, and“geominQ” and
“cluster” which are possible oblique transformations of the solution. The default is to do
a oblimin transformation, although prior versions defaulted to varimax. The measures of
factor adequacy reflect the multiple correlations of the factors with the best fitting linear
regression estimates of the factor scores (Grice, 2001).

5.1.2 Principal Axis Factor Analysis

An alternative, least squares algorithm, factor.pa, (incorporated into fa as an option (fm
= “pa”) does a Principal Axis factor analysis by iteratively doing an eigen value decompo-
sition of the correlation matrix with the diagonal replaced by the values estimated by the
factors of the previous iteration. This OLS solution is not as sensitive to improper matri-
ces as is the maximum likelihood method, and will sometimes produce more interpretable
results. It seems as if the SAS example for PA uses only one iteration. Setting the max.iter
parameter to 1 produces the SAS solution.

The solutions from the fa, the factor.minres and factor.pa as well as the principal
functions can be rotated or transformed with a number of options. Some of these call
the GPArotation package. Orthogonal rotations are varimax and quartimax. Oblique
transformations include oblimin, quartimin and then two targeted rotation functions
Promax and target.rot. The latter of these will transform a loadings matrix towards
an arbitrary target matrix. The default is to transform towards an independent cluster
solution.

Using the Thurstone data set, three factors were requested and then transformed into an
independent clusters solution using targ