Add margin row totals in dplyr chain

后端 未结 7 2129
一整个雨季
一整个雨季 2020-12-05 00:25

I would like to add overall summary rows while also calculating summaries by group using dplyr. I have found various questions asking how to do this, e.g. here, here, and he

相关标签:
7条回答
  • 2020-12-05 01:11

    One option is with do

    mtcars %>%
       count(cyl, gear) %>%
       ungroup() %>% 
       mutate(cyl=as.character(cyl)) %>% 
       do(bind_rows(., data.frame(cyl="Total", count(mtcars, gear)))) 
       #or replace the last 'do' step with 
       #bind_rows(cbind(cyl='Total', count(mtcars, gear))) #from  @JonnyPolonsky's comments
    
    #      cyl  gear     n
    #   <chr> <dbl> <int>
    #1      4     3     1
    #2      4     4     8
    #3      4     5     2
    #4      6     3     2
    #5      6     4     4
    #6      6     5     1
    #7      8     3    12
    #8      8     5     2
    #9  Total     3    15
    #10 Total     4    12
    #11 Total     5     5
    
    0 讨论(0)
  • 2020-12-05 01:15

    If you would like to have truly universal solution you could use a combination of purrr::map_df, base::c and base::sum mtcars %>% purrr::map_df(~c(.x, sum(.x, na.rm=TRUE))) %>% tail

    P.S. All columns must be numeric!

    0 讨论(0)
  • 2020-12-05 01:17

    With adorn_totals() from the janitor package:

    library(janitor)
    mtcars %>%
      tabyl(cyl, gear) %>%
      adorn_totals("row") 
    
       cyl  3  4 5
         4  1  8 2
         6  2  4 1
         8 12  0 2
     Total 15 12 5
    

    To get from there to the "long" form in your post, add tidyr::gather() to the pipeline:

    mtcars %>%
      tabyl(cyl, gear) %>%
      adorn_totals("row") %>%
      tidyr::gather(gear, n, 2:ncol(.), convert = TRUE)
    
         cyl gear  n
    1      4    3  1
    2      6    3  2
    3      8    3 12
    4  Total    3 15
    5      4    4  8
    6      6    4  4
    7      8    4  0
    8  Total    4 12
    9      4    5  2
    10     6    5  1
    11     8    5  2
    12 Total    5  5
    

    Self-promotion alert, I authored this package - adding this answer b/c it's a genuinely efficient solution here.

    0 讨论(0)
  • 2020-12-05 01:18

    Here's a take on the accepted answer, using new functions introduced in dplyr 1.0.0 and tidyr 1.0.0.

    We pivot the counts using the new tidyr::pivot_wider. Then use the new dplyr::rowwise and dplyr::c_across to sum the counts for the total column.

    We can also use tidyr::pivot_longer to get in desired long format.

    library(dplyr, warn.conflicts = FALSE)
    library(tidyr)
    
    cyl_gear_sum <- mtcars %>%
      count(cyl, gear) %>%
      pivot_wider(names_from = gear, values_from = n, values_fill = list(n = 0)) %>%
      rowwise(cyl) %>%
      mutate(gear_total = sum(c_across()))
    
    cyl_gear_sum
    #> # A tibble: 3 x 5
    #> # Rowwise:  cyl
    #>     cyl   `3`   `4`   `5` gear_total
    #>   <dbl> <int> <int> <int>      <int>
    #> 1     4     1     8     2         11
    #> 2     6     2     4     1          7
    #> 3     8    12     0     2         14
    
    # total as row
    cyl_gear_sum %>% 
      pivot_longer(-cyl, names_to = "gear", values_to = "n")
    #> # A tibble: 12 x 3
    #>      cyl gear           n
    #>    <dbl> <chr>      <int>
    #>  1     4 3              1
    #>  2     4 4              8
    #>  3     4 5              2
    #>  4     4 gear_total    11
    #>  5     6 3              2
    #>  6     6 4              4
    #>  7     6 5              1
    #>  8     6 gear_total     7
    #>  9     8 3             12
    #> 10     8 4              0
    #> 11     8 5              2
    #> 12     8 gear_total    14
    

    Created on 2020-04-07 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2020-12-05 01:19

    An addition to @arkrun's answer that is not easy to add as a comment:

    Although a little more complex, this format allows for previous modifications in the data frame. Useful when there is a longer chain of verbs before the table is generated. (You want to change names, or select only specific variables)

    mtcars %>%
       count(cyl, gear) %>%
       ungroup() %>% 
       mutate(cyl=as.character(cyl))
    bind_rows(group_by(.,gear) %>%
                  summarise(n=sum(n)) %>%
                  mutate(cyl='Total')) %>%
    spread(cyl)
    
    ## A tibble: 3 x 5
    #   gear   `4`   `6`   `8` Total
    #* <dbl> <dbl> <dbl> <dbl> <dbl>
    #1     3     1     2    12    15
    #2     4     8     4     0    12
    #3     5     2     1     2     5
    

    This can also be doubled up to generate a total row for the spread as well.

    mtcars %>%
      count(cyl, gear) %>%
      ungroup() %>% 
      mutate(cyl=as.character(cyl),
             gear = as.character(gear)) %>%
      bind_rows(group_by(.,gear) %>%
                  summarise(n=sum(n)) %>%
                  mutate(cyl='Total')) %>%
      bind_rows(group_by(.,cyl) %>%
                  summarise(n=sum(n)) %>%
                  mutate(gear='Total')) %>%
      spread(cyl,n,fill=0)
    
    # A tibble: 4 x 5
       gear   `4`   `6`   `8` Total
    * <chr> <dbl> <dbl> <dbl> <dbl>
    1     3     1     2    12    15
    2     4     8     4     0    12
    3     5     2     1     2     5
    4 Total    11     7    14    32
    
    0 讨论(0)
  • 2020-12-05 01:22

    Here is my suggestion.

    1. Find the combination of relevant grouping variables via the powerSet function.
    2. split the data frame into a list, grouped by the powerSet of the grouping variables
    3. summarise the data frame using an appropriate summary function (e.g. mean)
    4. bind_rows the result - summaries are now NA because these columns are dropped in step 3
    5. replace NA values of the grouping variables using appropriate names.

    Note. If grouping variables are numeric, they will not be dropped in step 3 - I therefore mutate them to character variables.

    powerSetList <- function(df, ...) {
      rje::powerSet(x = c(...))[-1] %>% lapply(function(x, tdf = df) group_by(tdf, .dots=x)) %>% c(list(tibble(df)), .)
    } 
    
    mtcars %>% 
      mutate_at(vars("cyl", "gear"), as.character) %>%
      powerSetList("cyl", "gear") %>%
      map(~summarise_if(., is.numeric, .funs = mean)) %>%
      bind_rows() %>%
      replace_na(list(gear = "all gears",
                      cyl = "all cyls"))
    
    0 讨论(0)
提交回复
热议问题