Specifying formula in R with glm without explicit declaration of each covariate

后端 未结 2 1028
遥遥无期
遥遥无期 2020-11-30 22:33

I would like to force specific variables into glm regressions without fully specifying each one. My real data set has ~200 variables. I haven\'t been able to find samples

相关标签:
2条回答
  • 2020-11-30 23:03

    Your use of . creatively to build the formula containing all or almost all variables is a good and clean approach. Another option that is useful sometimes is to build the formula programatically as a string, and then convert it to formula using as.formula:

    vars <- paste("Var",1:10,sep="")
    fla <- paste("y ~", paste(vars, collapse="+"))
    as.formula(fla)
    

    Of course, you can make the fla object way more complicated.

    0 讨论(0)
  • 2020-11-30 23:07

    Aniko answered your question. To extend a bit :

    You can also exclude variables using - :

    glm(Y~.-W1+A*I(W2^2), family=binomial, data=samp)
    

    For large groups of variables, I often make a frame for grouping the variables, which allows you to do something like :

    vars <- data.frame(
        names = names(samp),
        main = c(T,F,T,F),
        quadratic =c(F,T,T,F),
        main2=c(T,T,F,F),
        stringsAsFactors=F
    )
    
    
    regform <- paste(
        "Y ~",
        paste(
          paste(vars[vars$main,1],collapse="+"),
          paste(vars[1,1],paste("*I(",vars[vars$quadratic,1],"^2)"),collapse="+"),
          sep="+"
        )
    )
    > regform
    [1] "Y ~ W1+A+W1 *I( W2 ^2)+W1 *I( A ^2)"
    
    > glm(as.formula(regform),data=samp,family=binomial)
    

    Using all kind of conditions (on name, on structure, whatever) to fill the dataframe, allows me to quickly select groups of variables in large datasets.

    0 讨论(0)
提交回复
热议问题