substitute in r together with anova

余生长醉 提交于 2019-12-19 08:13:15

问题


I tried to run anova on different sets of data and didn't quite know how to do it. I goolged and found this to be useful: https://stats.idre.ucla.edu/r/codefragments/looping_strings/

hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})

My understanding of what the above codes does is it creates a function lm() and apply it to each variable in varlist and it does linear regression on each of them.

So I thought use aov instead of lm would work for me like this:

aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)

However, I got this error:

Error in terms.default(formula, "Error", data = data) : 
no terms component nor attribute

I have no idea of where the error comes from. Please help!


回答1:


The problem is that substitute() returns an expression, not a formula. I think @thelatemail's suggestion of

lm(as.formula(paste("read ~",x)), data = hsb2)

is a good work around. Alternatively you could evaluate the expression to get the formula with

models <- lapply(varlist, function(x) {
    aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})

I guess it depends on what you want to do with the list of models afterward. Doing

models <- lapply(varlist, function(x) {
    eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})

gives a "cleaner" call property for each of the result.




回答2:


This should do it. The varlist vector is going to be passed item by item to the function and the column will be delivered. The lm function will only see a two column dataframe and the "read" column will be the dependent variable each time. No need for fancy substitution:

models <- sapply(varlist, function(x) {
lm(read ~ .,  data = hsb2[, c("read", x) ])
}, simplify=FALSE)

> summary(models[[1]])  # The first model. Note the use of "[["

Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])

Residuals:
     Min       1Q   Median       3Q      Max 
-19.8565  -5.8976  -0.8565   5.5801  24.2703 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.16215    3.30716   5.492 1.21e-07 ***
write        0.64553    0.06168  10.465  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529 
F-statistic: 109.5 on 1 and 198 DF,  p-value: < 2.2e-16 

For all the models::

lapply(models, summary)



回答3:


akrun borrowed my answer the other night, now I'm (partially) borrowing his.

do.call puts the variables into the call output so it reads properly. Here's a general function for simple regression.

doModel <- function(col1, col2, data = hsb2, FUNC = "lm") 
{
    form <- as.formula(paste(col1, "~", col2))
    do.call(FUNC, list(form, substitute(data)))
}     

lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept)        write  
#     18.1622       0.6455  
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept)         math  
#     14.0725       0.7248  
#
# ...
# ...
# ...

Note: As thelatemail mentions in his comment

sapply(varlist, doModel, col1 = "read", simplify = FALSE)

will keep the names in the list and also allow list$name subsetting.



来源:https://stackoverflow.com/questions/25987367/substitute-in-r-together-with-anova

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!