Many regressions using tidyverse and broom: Same dependent variable, different independent variables

问题

This link shows how to answer my question in the case where we have the same independent variables, but potentially many different dependent variables: Use broom and tidyverse to run regressions on different dependent variables.

But my question is, how can I apply the same approach (e.g., tidyverse and broom) to run many regressions where we have the reverse situation: same dependent variables but different independent variable. In line with the code in the previous link, something like:

mod = lm(health ~ cbind(sex,income,happiness) + faculty, ds) %>% tidy()

However, this code does not do exactly what I want, and instead, produces:

Call:
lm(formula = income ~ cbind(sex, health) + faculty, data = ds)

Coefficients:
             (Intercept)     cbind(sex, health)sex  
                 945.049                   -47.911  
cbind(sex, health)health                   faculty  
                   2.342                     1.869

which is equivalent to:

lm(formula = income ~ sex + health + faculty, data = ds)

回答1:

Basically you'll need some way to create all the different formulas you want. Here's one way

qq <- expression(sex,income,happiness)
formulae <- lapply(qq, function(v) bquote(health~.(v)+faculty))
# [[1]]
# health ~ sex + faculty
# [[2]]
# health ~ income + faculty
# [[3]]
# health ~ happiness + faculty

Once you have all your formula, you can map them to lm and then to tidy()

library(purrr)
library(broom)

formulae %>% map(~lm(.x, ds)) %>% map_dfr(tidy, .id="model")
# A tibble: 9 x 6
#   model term         estimate std.error statistic  p.value
#   <chr> <chr>           <dbl>     <dbl>     <dbl>    <dbl>
# 1 1     (Intercept) 19.5        0.504     38.6    1.13e-60
# 2 1     sex          0.755      0.651      1.16   2.49e- 1
# 3 1     faculty     -0.00360    0.291     -0.0124 9.90e- 1
# 4 2     (Intercept) 19.8        1.70      11.7    3.18e-20
# 5 2     income      -0.000244   0.00162   -0.150  8.81e- 1
# 6 2     faculty      0.143      0.264      0.542  5.89e- 1
# 7 3     (Intercept) 18.4        1.88       9.74   4.79e-16
# 8 3     happiness    0.205      0.299      0.684  4.96e- 1
# 9 3     faculty      0.141      0.262      0.539  5.91e- 1

Using sample data

set.seed(11)
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
             happiness = rnorm(100, mean = 6, sd=1),
             health = rnorm(100, mean=20, sd = 3),
             sex = c(0,1),
             faculty = c(0,1,2,3))

回答2:

You could use the combn function to get all combinations of n independent variables and then iterate over them. Let's say n=3 here:

library(tidyverse)

ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
                 happiness = rnorm(100, mean = 6, sd=1),
                 health = rnorm(100, mean=20, sd = 3),
                 sex = c(0,1),
                 faculty = c(0,1,2,3))

ivs = combn(names(ds)[names(ds)!="income"], 3, simplify=FALSE)
# Or, to get all models with 1 to 4 variables:
# ivs = map(1:4, ~combn(names(ds)[names(ds)!="income"], .x, simplify=FALSE)) %>% 
#         flatten()

names(ivs) = map(ivs, ~paste(.x, collapse="-"))

models = map(ivs, 
             ~lm(as.formula(paste("income ~", paste(.x, collapse="+"))), data=ds))

map_df(models, broom::tidy, .id="model")

   model                    term        estimate std.error statistic  p.value
 * <chr>                    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
 1 happiness-health-sex     (Intercept)  1086.      201.      5.39   5.00e- 7
 2 happiness-health-sex     happiness     -25.4      21.4    -1.19   2.38e- 1
 3 happiness-health-sex     health          3.58      6.99    0.512  6.10e- 1
 4 happiness-health-sex     sex            11.5      41.5     0.277  7.82e- 1
 5 happiness-health-faculty (Intercept)  1085.      197.      5.50   3.12e- 7
 6 happiness-health-faculty happiness     -25.8      20.9    -1.23   2.21e- 1
 7 happiness-health-faculty health          3.45      6.98    0.494  6.23e- 1
 8 happiness-health-faculty faculty         7.86     18.2     0.432  6.67e- 1
 9 happiness-sex-faculty    (Intercept)  1153.      141.      8.21   1.04e-12
10 happiness-sex-faculty    happiness     -25.9      21.4    -1.21   2.28e- 1
11 happiness-sex-faculty    sex             3.44     46.2     0.0744 9.41e- 1
12 happiness-sex-faculty    faculty         7.40     20.2     0.366  7.15e- 1
13 health-sex-faculty       (Intercept)   911.      143.      6.35   7.06e- 9
14 health-sex-faculty       health          3.90      7.03    0.554  5.81e- 1
15 health-sex-faculty       sex            15.6      45.6     0.343  7.32e- 1
16 health-sex-faculty       faculty         7.02     20.4     0.345  7.31e- 1

来源：https://stackoverflow.com/questions/61512343/many-regressions-using-tidyverse-and-broom-same-dependent-variable-different-i

标签

dplyr

tidyverse

broom