Fit a different model for each row of a list-columns data frame

后端 未结 2 2035
误落风尘
误落风尘 2021-01-13 10:25

What is the best way to fit different model formulae that vary by the row of a data frame with the list-columns data structure in tidyverse?

In R for Data Science, H

相关标签:
2条回答
  • 2021-01-13 10:44

    I find it is easier to make a list of model formula. each model was only fit once for the corresponding continent. I add a new column formula to the nested data to make sure that the formula and the continent are in the same order in case they are not.

    formulae <- c(
        Asia= lifeExp ~ year,
        Europe= lifeExp ~ year + pop,
        Africa= lifeExp ~ year + gdpPercap,
        Americas= lifeExp ~ year - 1,
        Oceania= lifeExp ~ year + pop + gdpPercap
    )
    
    df <- gapminder %>%
        group_by(continent) %>%
        nest() %>%
        mutate(formula = formulae[as.character(continent)]) %>%
        mutate(model = map2(formula, data, ~ lm(.x, .y))) %>%
        mutate(glance=map(model, glance)) %>%
        unnest(glance, .drop=T)
    
    # # A tibble: 5 × 12
    #   continent r.squared adj.r.squared     sigma  statistic       p.value    df      logLik        AIC        BIC
    #      <fctr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>       <dbl>      <dbl>      <dbl>
    # 1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2 -1427.65947 2861.31893 2873.26317
    # 2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3  -995.41016 1998.82033 2014.36475
    # 3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3 -2098.46089 4204.92179 4222.66639
    # 4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1 -1083.35918 2170.71836 2178.12593
    # 5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4   -22.06696   54.13392   60.02419
    # # ... with 2 more variables: deviance <dbl>, df.residual <int>
    
    0 讨论(0)
  • 2021-01-13 11:09

    I found purrr::modify_depth() that does what I want to do with est_model() in the original question. This is the solution I am now happy with:

    library(gapminder)
    library(tidyverse)
    library(purrr)
    library(broom)
    
    fmlas <- tibble::tribble(
      ~continent, ~formula,
      "Asia", ~lm(lifeExp ~ year, data = .),
      "Europe", ~lm(lifeExp ~ year + pop, data = .),
      "Africa", ~lm(lifeExp ~ year + gdpPercap, data = .),
      "Americas", ~lm(lifeExp ~ year - 1, data = .),
      "Oceania", ~lm(lifeExp ~ year + pop + gdpPercap, data = .)
    )
    
    by_continent <- gapminder %>% 
      nest(-continent) %>%
      left_join(fmlas) %>%
      mutate(model=map2(data, formula, ~modify_depth(.x, 0, .y)))
    
    by_continent %>% 
      mutate(glance=map(model, glance)) %>% 
      unnest(glance, .drop=T)
    
    0 讨论(0)
提交回复
热议问题