Printing regression coefficients from multiple models to a shared data frame

后端 未结 2 741
礼貌的吻别
礼貌的吻别 2021-01-05 20:35

This is a little rudimentary, I know. Basically, I want to use the save data from the coef function to a shared data frame for models that all pull limited possible variable

相关标签:
2条回答
  • 2021-01-05 21:00

    I'd approach this by doing something like this:

    x1 <- rnorm(10)
    x2 <- rnorm(10)
    x3 <- rnorm(10)
    y <- rnorm(10)
    m1 <- lm(y ~ x1 + x2)
    m2 <- lm(y ~ x1 + x3) 
    m3 <- lm(y ~ x2 + x3)
    
    variables <- data.frame(variable = c("(Intercept)", "x1", "x2", "x3"),
                            model = rep(c("m1", "m2", "m3"), each = 4))
    data <- data.frame(variable = c(names(coef(m1)), names(coef(m2)), 
                                    names(coef(m3))),
                       estimate = c(coef(m1), coef(m2), coef(m3)), 
                       model = c(rep("m1", length(coef(m1))), 
                                 rep("m2", length(coef(m2))),
                                 rep("m3", length(coef(m3)))))
    data2 <- left_join(variables, data)
    data2$estimate[is.na(data2$estimate)] <- 0
    data2
    reshape(data2, timevar = "variable", v.names = "estimate", 
            idvar = "model", direction = "wide")
    

    Basically, fit the models and then extract the estimates and the row names. Then make a data frame variables that includes all possible variable names for each model. Use left_join from dplyr to do the join and then reshape it into the format you want.

    0 讨论(0)
  • 2021-01-05 21:02

    The first step is to combine your coefficients into a data frame with one row per combination of model and term. Then you'll be able to spread it into a table with one row per model and one column per term.

    My broom package has a useful function, tidy for turning a linear fit into a data frame of coefficients:

    fit <- lm(mpg ~ wt + disp + qsec, mtcars)
    library(broom)
    tidy(fit)
    #          term  estimate std.error statistic p.value
    # 1 (Intercept) 19.777558    5.9383    3.3305 0.00244
    # 2          wt -5.034410    1.2241   -4.1127 0.00031
    # 3        disp -0.000128    0.0106   -0.0121 0.99042
    # 4        qsec  0.926649    0.3421    2.7087 0.01139
    

    (Note that unlike coef, this returns a data frame rather than a matrix, and incorporates the terms as a column rather than rownames). You can apply this function to each of your models and then recombine, for example with plyr's ldply. We generate an example using 20 of the same model as your "models":

    models <- replicate(20, lm(mpg ~ wt + disp + qsec, mtcars), simplify = FALSE)
    names(models) <- paste0("MODEL", 1:20)
    

    Then our "tidy and recombine" code will be:

    all_coefs <- plyr::ldply(models, tidy, .id = "model")
    head(all_coefs)
    #    model        term  estimate std.error statistic p.value
    # 1 MODEL1 (Intercept) 19.777558    5.9383    3.3305 0.00244
    # 2 MODEL1          wt -5.034410    1.2241   -4.1127 0.00031
    # 3 MODEL1        disp -0.000128    0.0106   -0.0121 0.99042
    # 4 MODEL1        qsec  0.926649    0.3421    2.7087 0.01139
    # 5 MODEL2 (Intercept) 19.777558    5.9383    3.3305 0.00244
    # 6 MODEL2          wt -5.034410    1.2241   -4.1127 0.00031
    

    You then need to remove the std.error, statistic, and p.value columns and spread the estimate term out. This can be done with the dplyr and tidyr packages:

    library(dplyr)
    library(tidyr)
    results <- all_coefs %>% select(-(std.error:p.value)) %>%
        spread(term, estimate)
    

    This produces:

         model (Intercept)      disp  qsec    wt
    1   MODEL1        19.8 -0.000128 0.927 -5.03
    2   MODEL2        19.8 -0.000128 0.927 -5.03
    3   MODEL3        19.8 -0.000128 0.927 -5.03
    4   MODEL4        19.8 -0.000128 0.927 -5.03
    5   MODEL5        19.8 -0.000128 0.927 -5.03
    

    Which is your desired output. (This output is boring since all the models were the same, but presumably yours are different). If some models have coefficients others don't, the missing values will be filled in with NA.

    0 讨论(0)
提交回复
热议问题