Dynamic variable names in R regressions

后端 未结 3 836
不思量自难忘°
不思量自难忘° 2021-01-05 04:48

Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually

相关标签:
3条回答
  • Personally, I like to do this with some computing on the language. For me, a combination of bquote with eval is easiest (to remember).

    var <- as.symbol(var)
    eval(bquote(summary(lm(y ~ .(var) + x2, data = df2))))
    #Call:
    #lm(formula = y ~ x1 + x2, data = df2)
    #
    #Residuals:
    #     Min       1Q   Median       3Q      Max 
    #-0.49298 -0.26248 -0.00046  0.24111  0.51988 
    #
    #Coefficients:
    #            Estimate Std. Error t value Pr(>|t|)    
    #(Intercept)  0.50244    0.02480  20.258   <2e-16 ***
    #x1          -0.01468    0.03161  -0.464    0.643    
    #x2          -0.01635    0.03227  -0.507    0.612    
    #---
    #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    #
    #Residual standard error: 0.2878 on 997 degrees of freedom
    #Multiple R-squared:  0.0004708,    Adjusted R-squared:  -0.001534 
    #F-statistic: 0.2348 on 2 and 997 DF,  p-value: 0.7908
    

    I find this superior to any approach that doesn't show the same call as summary(lm(y ~ x1+x2, data=df2)).

    0 讨论(0)
  • 2021-01-05 05:13

    The bang-bang operator !! only works with "tidy" functions. It's not a part of the core R language. A base R function like lm() has no idea how to expand such operators. Instead, you need to wrap those in functions that can do the expansion. rlang::expr is one such example

    rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2)))
    # summary(lm(y ~ x1 + x2, data = df2))
    

    Then you need to use rlang::eval_tidy to actually evaluate it

    rlang::eval_tidy(rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2))))
    
    # Call:
    # lm(formula = y ~ x1 + x2, data = df2)
    # 
    # Residuals:
    #     Min       1Q   Median       3Q      Max 
    # -0.49178 -0.25482  0.00027  0.24566  0.50730 
    # 
    # Coefficients:
    #               Estimate Std. Error t value Pr(>|t|)    
    # (Intercept)  0.4953683  0.0242949  20.390   <2e-16 ***
    # x1          -0.0006298  0.0314389  -0.020    0.984    
    # x2          -0.0052848  0.0318073  -0.166    0.868    
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    #
    # Residual standard error: 0.2882 on 997 degrees of freedom
    # Multiple R-squared:  2.796e-05,   Adjusted R-squared:  -0.001978 
    # F-statistic: 0.01394 on 2 and 997 DF,  p-value: 0.9862
    

    You can see this version preserves the expanded formula in the model object.

    0 讨论(0)
  • 2021-01-05 05:16

    1) Just use lm(df2) or if lm has additional columns beyond what is shown in the question but we just want to regress on x1 and x2 then

    df3 <- df2[c("y", var, "x2")]
    lm(df3)
    

    The following are optional and only apply if it is important that the formula appear in the output as if it had been explicitly given. Compute the formula fo using the first line below and then run lm as in the second line:

    fo <- formula(model.frame(df3))
    fm <- do.call("lm", list(fo, quote(df3)))
    

    or just run lm as in the first line below and then write the formula into it as in the second line:

    fm <- lm(df3)
    fm$call <- formula(model.frame(df3))
    

    Either one gives this:

    > fm
    Call:
    lm(formula = y ~ x1 + x2, data = df3)
    
    Coefficients:
    (Intercept)           x1           x2  
        0.44752      0.04278      0.05011  
    

    2) character string lm accepts a character string for the formula so this also works. The fn$ causes substitution to occur in the character arguments.

    library(gsubfn)
    
    fn$lm("y ~ $var + x2", quote(df2))
    

    or at the expense of more involved code, without gsubfn:

    do.call("lm", list(sprintf("y ~ %s + x2", var), quote(df2)))
    

    or if you don't care that the formula displays without var substituted then just:

    lm(sprintf("y ~ %s + x2", var), df2)
    
    0 讨论(0)
提交回复
热议问题