What is the best way to fit different model formulae that vary by the row of a data frame with the list-columns data structure in tidyverse?
In R for Data Science, H
I find it is easier to make a list of model formula. each model was only fit once for the corresponding continent
. I add a new column formula
to the nested data to make sure that the formula
and the continent
are in the same order in case they are not.
formulae <- c(
Asia= lifeExp ~ year,
Europe= lifeExp ~ year + pop,
Africa= lifeExp ~ year + gdpPercap,
Americas= lifeExp ~ year - 1,
Oceania= lifeExp ~ year + pop + gdpPercap
)
df <- gapminder %>%
group_by(continent) %>%
nest() %>%
mutate(formula = formulae[as.character(continent)]) %>%
mutate(model = map2(formula, data, ~ lm(.x, .y))) %>%
mutate(glance=map(model, glance)) %>%
unnest(glance, .drop=T)
# # A tibble: 5 × 12
# continent r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 Asia 0.4356350 0.4342026 8.9244419 304.1298 6.922751e-51 2 -1427.65947 2861.31893 2873.26317
# 2 Europe 0.4984677 0.4956580 3.8584819 177.4093 3.186760e-54 3 -995.41016 1998.82033 2014.36475
# 3 Africa 0.4160797 0.4141991 7.0033542 221.2506 2.836552e-73 3 -2098.46089 4204.92179 4222.66639
# 4 Americas 0.9812082 0.9811453 8.9703814 15612.1901 4.227928e-260 1 -1083.35918 2170.71836 2178.12593
# 5 Oceania 0.9733268 0.9693258 0.6647653 243.2719 6.662577e-16 4 -22.06696 54.13392 60.02419
# # ... with 2 more variables: deviance <dbl>, df.residual <int>
I found purrr::modify_depth() that does what I want to do with est_model()
in the original question. This is the solution I am now happy with:
library(gapminder)
library(tidyverse)
library(purrr)
library(broom)
fmlas <- tibble::tribble(
~continent, ~formula,
"Asia", ~lm(lifeExp ~ year, data = .),
"Europe", ~lm(lifeExp ~ year + pop, data = .),
"Africa", ~lm(lifeExp ~ year + gdpPercap, data = .),
"Americas", ~lm(lifeExp ~ year - 1, data = .),
"Oceania", ~lm(lifeExp ~ year + pop + gdpPercap, data = .)
)
by_continent <- gapminder %>%
nest(-continent) %>%
left_join(fmlas) %>%
mutate(model=map2(data, formula, ~modify_depth(.x, 0, .y)))
by_continent %>%
mutate(glance=map(model, glance)) %>%
unnest(glance, .drop=T)