Dynamic variable names in R regressions

后端 未结 3 837
不思量自难忘°
不思量自难忘° 2021-01-05 04:48

Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually

3条回答
  •  不知归路
    2021-01-05 05:16

    1) Just use lm(df2) or if lm has additional columns beyond what is shown in the question but we just want to regress on x1 and x2 then

    df3 <- df2[c("y", var, "x2")]
    lm(df3)
    

    The following are optional and only apply if it is important that the formula appear in the output as if it had been explicitly given. Compute the formula fo using the first line below and then run lm as in the second line:

    fo <- formula(model.frame(df3))
    fm <- do.call("lm", list(fo, quote(df3)))
    

    or just run lm as in the first line below and then write the formula into it as in the second line:

    fm <- lm(df3)
    fm$call <- formula(model.frame(df3))
    

    Either one gives this:

    > fm
    Call:
    lm(formula = y ~ x1 + x2, data = df3)
    
    Coefficients:
    (Intercept)           x1           x2  
        0.44752      0.04278      0.05011  
    

    2) character string lm accepts a character string for the formula so this also works. The fn$ causes substitution to occur in the character arguments.

    library(gsubfn)
    
    fn$lm("y ~ $var + x2", quote(df2))
    

    or at the expense of more involved code, without gsubfn:

    do.call("lm", list(sprintf("y ~ %s + x2", var), quote(df2)))
    

    or if you don't care that the formula displays without var substituted then just:

    lm(sprintf("y ~ %s + x2", var), df2)
    

提交回复
热议问题