问题
This link shows how to answer my question in the case where we have the same independent variables, but potentially many different dependent variables: Use broom and tidyverse to run regressions on different dependent variables.
But my question is, how can I apply the same approach (e.g., tidyverse and broom) to run many regressions where we have the reverse situation: same dependent variables but different independent variable. In line with the code in the previous link, something like:
mod = lm(health ~ cbind(sex,income,happiness) + faculty, ds) %>% tidy()
However, this code does not do exactly what I want, and instead, produces:
Call:
lm(formula = income ~ cbind(sex, health) + faculty, data = ds)
Coefficients:
(Intercept) cbind(sex, health)sex
945.049 -47.911
cbind(sex, health)health faculty
2.342 1.869
which is equivalent to:
lm(formula = income ~ sex + health + faculty, data = ds)
回答1:
Basically you'll need some way to create all the different formulas you want. Here's one way
qq <- expression(sex,income,happiness)
formulae <- lapply(qq, function(v) bquote(health~.(v)+faculty))
# [[1]]
# health ~ sex + faculty
# [[2]]
# health ~ income + faculty
# [[3]]
# health ~ happiness + faculty
Once you have all your formula, you can map them to lm
and then to tidy()
library(purrr)
library(broom)
formulae %>% map(~lm(.x, ds)) %>% map_dfr(tidy, .id="model")
# A tibble: 9 x 6
# model term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 (Intercept) 19.5 0.504 38.6 1.13e-60
# 2 1 sex 0.755 0.651 1.16 2.49e- 1
# 3 1 faculty -0.00360 0.291 -0.0124 9.90e- 1
# 4 2 (Intercept) 19.8 1.70 11.7 3.18e-20
# 5 2 income -0.000244 0.00162 -0.150 8.81e- 1
# 6 2 faculty 0.143 0.264 0.542 5.89e- 1
# 7 3 (Intercept) 18.4 1.88 9.74 4.79e-16
# 8 3 happiness 0.205 0.299 0.684 4.96e- 1
# 9 3 faculty 0.141 0.262 0.539 5.91e- 1
Using sample data
set.seed(11)
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
happiness = rnorm(100, mean = 6, sd=1),
health = rnorm(100, mean=20, sd = 3),
sex = c(0,1),
faculty = c(0,1,2,3))
回答2:
You could use the combn
function to get all combinations of n independent variables and then iterate over them. Let's say n=3 here:
library(tidyverse)
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
happiness = rnorm(100, mean = 6, sd=1),
health = rnorm(100, mean=20, sd = 3),
sex = c(0,1),
faculty = c(0,1,2,3))
ivs = combn(names(ds)[names(ds)!="income"], 3, simplify=FALSE)
# Or, to get all models with 1 to 4 variables:
# ivs = map(1:4, ~combn(names(ds)[names(ds)!="income"], .x, simplify=FALSE)) %>%
# flatten()
names(ivs) = map(ivs, ~paste(.x, collapse="-"))
models = map(ivs,
~lm(as.formula(paste("income ~", paste(.x, collapse="+"))), data=ds))
map_df(models, broom::tidy, .id="model")
model term estimate std.error statistic p.value * <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 happiness-health-sex (Intercept) 1086. 201. 5.39 5.00e- 7 2 happiness-health-sex happiness -25.4 21.4 -1.19 2.38e- 1 3 happiness-health-sex health 3.58 6.99 0.512 6.10e- 1 4 happiness-health-sex sex 11.5 41.5 0.277 7.82e- 1 5 happiness-health-faculty (Intercept) 1085. 197. 5.50 3.12e- 7 6 happiness-health-faculty happiness -25.8 20.9 -1.23 2.21e- 1 7 happiness-health-faculty health 3.45 6.98 0.494 6.23e- 1 8 happiness-health-faculty faculty 7.86 18.2 0.432 6.67e- 1 9 happiness-sex-faculty (Intercept) 1153. 141. 8.21 1.04e-12 10 happiness-sex-faculty happiness -25.9 21.4 -1.21 2.28e- 1 11 happiness-sex-faculty sex 3.44 46.2 0.0744 9.41e- 1 12 happiness-sex-faculty faculty 7.40 20.2 0.366 7.15e- 1 13 health-sex-faculty (Intercept) 911. 143. 6.35 7.06e- 9 14 health-sex-faculty health 3.90 7.03 0.554 5.81e- 1 15 health-sex-faculty sex 15.6 45.6 0.343 7.32e- 1 16 health-sex-faculty faculty 7.02 20.4 0.345 7.31e- 1
来源:https://stackoverflow.com/questions/61512343/many-regressions-using-tidyverse-and-broom-same-dependent-variable-different-i