I'm conducting a factor analysis of several variables in R using factanal()
(but am open to using other packages). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they are standardized with a normal distribution of mean=0, SD=1, and are not on the original metric of the input variables. How can I obtain unstandardized factor scores that have the same metric as the input variables? Ideally, this would mean a similar mean, sd, range, and distribution.
I asked a similar question previously, but the respondent's answer involved rescaling standardized (i.e., normally distributed) factor scores. Note that I don't want to transform standardized factor scores to unstandardized ones because the distributions of my indicators are non-normal (i.e., the normal distribution of standardized factor scores cannot be easily transformed to the raw metric of my indicators). In other words, I'd like to estimate unstandardized factor scores on the raw metric of the indicators without first estimating them on a standardized metric.
Also, there are some missing data. How can I obtain (unstandardized) factor scores for all cases, even those who don't have data on all items?
Here's a small example:
library(psych)
v1 <- c(1,1,1,NA,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,NA,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,NA,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
m1FactorScores <- factanal(~v1+v2+v3+v4+v5+v6, factors = 1, scores = "Bartlett", na.action="na.exclude")$scores
describe(m1) #means~2.3, sds~1.5
describe(m1FactorScores) #mean=0, sd=1
The data above are just a small example. My actual data are not likert/ordinal data. They are forecasts of football players' passing yards from various sources. My hope is that a "latent average" would more accurately forecast players' passing yards than an average because it would discard the unique biases of each source. The data are highly positively skewed, however, and forcing the latent variable and its factor scores to be normally distributed results in implausibly high values for many players (e.g., over 6,000 yards passing next season).
The problem is: The answer to your previous question is still correct. Whether you fix the scale of the latent variable in advance or rescale the standardized variable is irrelevant, because the resulting scores will be the same.
Here is an illustration using lavaan
, including both options. Fixing the factor loadings and intercepts isn't supported in factanal
as far as I know:
library(lavaan)
v1 <- c(1,1,1,2,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,NA,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,NA,2,1,6,5,4)
m1 <- data.frame(v1,v2,v3,v4,v5,v6)
# Option 1: fixing the scale according to v1
mean(v1) # 2.278
var(v1) # 2.448
fix.model <- "f1 =~ v1 + v2 + v3 + v4 + v5 + v6
f1 ~ 2.278*1
f1 ~~ 2.448*f1"
fix.fit <- lavaan(fix.model, data = m1, meanstructure=TRUE, missing="fiml",
int.ov.free = TRUE, int.lv.free = TRUE, auto.var = TRUE)
# Option 2: fixing the scale to standardize the latent variable
std.model <- "f1 =~ v1 + v2 + v3 + v4 + v5 + v6
f1 ~ 0*1
f1 ~~ 1*f1"
std.fit <- lavaan(std.model, data = m1, meanstructure=TRUE, missing="fiml",
int.ov.free = TRUE, int.lv.free = TRUE, auto.var = TRUE)
# extract scores
fix.scores <- predict(fix.fit)[,1]
std.scores <- predict(std.fit)[,1]
rescaled <- std.scores * sd(v1) + mean(v1)
Notice the striking similarities between the fix.scores
and the rescaled
scores.
cbind(std.scores, rescaled, fix.scores)
# std.scores rescaled fix.scores
# [1,] -0.8220827 0.9916157 0.9917591
# [2,] -0.8113431 1.0084179 1.0085627
# [3,] -0.8098929 1.0106869 1.0108318
# [4,] -0.5844884 1.3633359 1.3635066
For the purposes of model fitting, the scale chosen for the latent variable is completely arbitrary. The distributional assumptions for the latent variable (i.e., normal) and the indicator variables (i.e., conditionally normal) are the same regardless of your choice and regardless of the actually distribution of your indicators.
If your indicators violate the distributional assumptions of the model, then this will be reflected in poor model fit or slow convergence, but not in the two approaches yielding different results.
来源:https://stackoverflow.com/questions/30858930/obtain-unstandardized-factor-scores-from-factor-analysis-in-r