问题
I tried to run anova on different sets of data and didn't quite know how to do it. I goolged and found this to be useful: https://stats.idre.ucla.edu/r/codefragments/looping_strings/
hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
My understanding of what the above codes does is it creates a function lm() and apply it to each variable in varlist and it does linear regression on each of them.
So I thought use aov instead of lm would work for me like this:
aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
However, I got this error:
Error in terms.default(formula, "Error", data = data) :
no terms component nor attribute
I have no idea of where the error comes from. Please help!
回答1:
The problem is that substitute()
returns an expression, not a formula. I think @thelatemail's suggestion of
lm(as.formula(paste("read ~",x)), data = hsb2)
is a good work around. Alternatively you could evaluate the expression to get the formula with
models <- lapply(varlist, function(x) {
aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})
I guess it depends on what you want to do with the list of models afterward. Doing
models <- lapply(varlist, function(x) {
eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})
gives a "cleaner" call
property for each of the result.
回答2:
This should do it. The varlist vector is going to be passed item by item to the function and the column will be delivered. The lm function will only see a two column dataframe and the "read" column will be the dependent variable each time. No need for fancy substitution:
models <- sapply(varlist, function(x) {
lm(read ~ ., data = hsb2[, c("read", x) ])
}, simplify=FALSE)
> summary(models[[1]]) # The first model. Note the use of "[["
Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])
Residuals:
Min 1Q Median 3Q Max
-19.8565 -5.8976 -0.8565 5.5801 24.2703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.16215 3.30716 5.492 1.21e-07 ***
write 0.64553 0.06168 10.465 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529
F-statistic: 109.5 on 1 and 198 DF, p-value: < 2.2e-16
For all the models::
lapply(models, summary)
回答3:
akrun borrowed my answer the other night, now I'm (partially) borrowing his.
do.call
puts the variables into the call
output so it reads properly. Here's a general function for simple regression.
doModel <- function(col1, col2, data = hsb2, FUNC = "lm")
{
form <- as.formula(paste(col1, "~", col2))
do.call(FUNC, list(form, substitute(data)))
}
lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept) write
# 18.1622 0.6455
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept) math
# 14.0725 0.7248
#
# ...
# ...
# ...
Note: As thelatemail mentions in his comment
sapply(varlist, doModel, col1 = "read", simplify = FALSE)
will keep the names in the list and also allow list$name
subsetting.
来源:https://stackoverflow.com/questions/25987367/substitute-in-r-together-with-anova