This is a little rudimentary, I know. Basically, I want to use the save data from the coef function to a shared data frame for models that all pull limited possible variable
I'd approach this by doing something like this:
x1 <- rnorm(10)
x2 <- rnorm(10)
x3 <- rnorm(10)
y <- rnorm(10)
m1 <- lm(y ~ x1 + x2)
m2 <- lm(y ~ x1 + x3)
m3 <- lm(y ~ x2 + x3)
variables <- data.frame(variable = c("(Intercept)", "x1", "x2", "x3"),
model = rep(c("m1", "m2", "m3"), each = 4))
data <- data.frame(variable = c(names(coef(m1)), names(coef(m2)),
names(coef(m3))),
estimate = c(coef(m1), coef(m2), coef(m3)),
model = c(rep("m1", length(coef(m1))),
rep("m2", length(coef(m2))),
rep("m3", length(coef(m3)))))
data2 <- left_join(variables, data)
data2$estimate[is.na(data2$estimate)] <- 0
data2
reshape(data2, timevar = "variable", v.names = "estimate",
idvar = "model", direction = "wide")
Basically, fit the models and then extract the estimates and the row names. Then make a data frame variables
that includes all possible variable names for each model. Use left_join
from dplyr
to do the join and then reshape it into the format you want.
The first step is to combine your coefficients into a data frame with one row per combination of model and term. Then you'll be able to spread it into a table with one row per model and one column per term.
My broom package has a useful function, tidy
for turning a linear fit into a data frame of coefficients:
fit <- lm(mpg ~ wt + disp + qsec, mtcars)
library(broom)
tidy(fit)
# term estimate std.error statistic p.value
# 1 (Intercept) 19.777558 5.9383 3.3305 0.00244
# 2 wt -5.034410 1.2241 -4.1127 0.00031
# 3 disp -0.000128 0.0106 -0.0121 0.99042
# 4 qsec 0.926649 0.3421 2.7087 0.01139
(Note that unlike coef
, this returns a data frame rather than a matrix, and incorporates the terms as a column rather than rownames). You can apply this function to each of your models and then recombine, for example with plyr's ldply. We generate an example using 20 of the same model as your "models":
models <- replicate(20, lm(mpg ~ wt + disp + qsec, mtcars), simplify = FALSE)
names(models) <- paste0("MODEL", 1:20)
Then our "tidy and recombine" code will be:
all_coefs <- plyr::ldply(models, tidy, .id = "model")
head(all_coefs)
# model term estimate std.error statistic p.value
# 1 MODEL1 (Intercept) 19.777558 5.9383 3.3305 0.00244
# 2 MODEL1 wt -5.034410 1.2241 -4.1127 0.00031
# 3 MODEL1 disp -0.000128 0.0106 -0.0121 0.99042
# 4 MODEL1 qsec 0.926649 0.3421 2.7087 0.01139
# 5 MODEL2 (Intercept) 19.777558 5.9383 3.3305 0.00244
# 6 MODEL2 wt -5.034410 1.2241 -4.1127 0.00031
You then need to remove the std.error, statistic, and p.value columns and spread the estimate
term out. This can be done with the dplyr and tidyr packages:
library(dplyr)
library(tidyr)
results <- all_coefs %>% select(-(std.error:p.value)) %>%
spread(term, estimate)
This produces:
model (Intercept) disp qsec wt
1 MODEL1 19.8 -0.000128 0.927 -5.03
2 MODEL2 19.8 -0.000128 0.927 -5.03
3 MODEL3 19.8 -0.000128 0.927 -5.03
4 MODEL4 19.8 -0.000128 0.927 -5.03
5 MODEL5 19.8 -0.000128 0.927 -5.03
Which is your desired output. (This output is boring since all the models were the same, but presumably yours are different). If some models have coefficients others don't, the missing values will be filled in with NA.