Dynamic variable names in R regressions

后端未结

关注

 3  836

Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually

相关标签:

3条回答

不要未来只要你来

2021-01-05 04:59

Personally, I like to do this with some computing on the language. For me, a combination of bquote with eval is easiest (to remember).

var <- as.symbol(var)
eval(bquote(summary(lm(y ~ .(var) + x2, data = df2))))
#Call:
#lm(formula = y ~ x1 + x2, data = df2)
#
#Residuals:
#     Min       1Q   Median       3Q      Max 
#-0.49298 -0.26248 -0.00046  0.24111  0.51988 
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)  0.50244    0.02480  20.258   <2e-16 ***
#x1          -0.01468    0.03161  -0.464    0.643    
#x2          -0.01635    0.03227  -0.507    0.612    
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.2878 on 997 degrees of freedom
#Multiple R-squared:  0.0004708,    Adjusted R-squared:  -0.001534 
#F-statistic: 0.2348 on 2 and 997 DF,  p-value: 0.7908

I find this superior to any approach that doesn't show the same call as summary(lm(y ~ x1+x2, data=df2)).

0 讨论(0)

既然无缘

2021-01-05 05:13

The bang-bang operator !! only works with "tidy" functions. It's not a part of the core R language. A base R function like lm() has no idea how to expand such operators. Instead, you need to wrap those in functions that can do the expansion. rlang::expr is one such example

rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2)))
# summary(lm(y ~ x1 + x2, data = df2))

Then you need to use rlang::eval_tidy to actually evaluate it

rlang::eval_tidy(rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2))))

# Call:
# lm(formula = y ~ x1 + x2, data = df2)
# 
# Residuals:
#     Min       1Q   Median       3Q      Max 
# -0.49178 -0.25482  0.00027  0.24566  0.50730 
# 
# Coefficients:
#               Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.4953683  0.0242949  20.390   <2e-16 ***
# x1          -0.0006298  0.0314389  -0.020    0.984    
# x2          -0.0052848  0.0318073  -0.166    0.868    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2882 on 997 degrees of freedom
# Multiple R-squared:  2.796e-05,   Adjusted R-squared:  -0.001978 
# F-statistic: 0.01394 on 2 and 997 DF,  p-value: 0.9862

You can see this version preserves the expanded formula in the model object.

0 讨论(0)

不知归路

2021-01-05 05:16
1) Just use lm(df2) or if lm has additional columns beyond what is shown in the question but we just want to regress on x1 and x2 then
```
df3 <- df2[c("y", var, "x2")]
lm(df3)
```
The following are optional and only apply if it is important that the formula appear in the output as if it had been explicitly given. Compute the formula fo using the first line below and then run lm as in the second line:
```
fo <- formula(model.frame(df3))
fm <- do.call("lm", list(fo, quote(df3)))
```
or just run lm as in the first line below and then write the formula into it as in the second line:
```
fm <- lm(df3)
fm$call <- formula(model.frame(df3))
```
Either one gives this:
```
> fm
Call:
lm(formula = y ~ x1 + x2, data = df3)

Coefficients:
(Intercept)           x1           x2  
    0.44752      0.04278      0.05011  
```
2) character string lm accepts a character string for the formula so this also works. The fn$ causes substitution to occur in the character arguments.
```
library(gsubfn)

fn$lm("y ~ $var + x2", quote(df2))
```
or at the expense of more involved code, without gsubfn:
```
do.call("lm", list(sprintf("y ~ %s + x2", var), quote(df2)))
```
or if you don't care that the formula displays without var substituted then just:
```
lm(sprintf("y ~ %s + x2", var), df2)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...