Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

后端未结

关注

 3  1581

生来不讨喜 2021-01-11 23:17

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because th

3条回答

北荒 (楼主)

2021-01-11 23:54

Here's a tidyverse solution that keeps your group_by statement the same. The key is to use mutate_if for each variable type first (i.e., numeric, character), then get distinct rows.

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats

Label <- c("203c", "203c", "204a", "204a", "204a", "204a", "204a", "204a",
  "204a", "204a")
Type <- c("wholefish", "flesh", "flesh", "fleshdelip", "formula", "formuladelip",
  "formula", "formuladelip", "wholefish", "wholefishdelip")
Proportion <- c(1, 1, 0.67714, 0.67714, 0.32285, 0.32285, 0.32285, 0.32285,
  0.67714, 0.67714)
N <- (1:10)
C <- (1:10)
Code <- c("c", "a", "a", "b", "a", "b", "c", "d", "c", "d")

df <- data_frame(Label, Type, Proportion, N, C, Code)
df
#> # A tibble: 10 x 6
#>    Label           Type Proportion     N     C  Code
#>                       
#>  1  203c      wholefish    1.00000     1     1     c
#>  2  203c          flesh    1.00000     2     2     a
#>  3  204a          flesh    0.67714     3     3     a
#>  4  204a     fleshdelip    0.67714     4     4     b
#>  5  204a        formula    0.32285     5     5     a
#>  6  204a   formuladelip    0.32285     6     6     b
#>  7  204a        formula    0.32285     7     7     c
#>  8  204a   formuladelip    0.32285     8     8     d
#>  9  204a      wholefish    0.67714     9     9     c
#> 10  204a wholefishdelip    0.67714    10    10     d

df %>%
  group_by(Label, Code) %>%
  mutate_if(is.numeric, sum) %>%
  mutate_if(is.character, funs(paste(unique(.), collapse = "_"))) %>%
  distinct()
#> # A tibble: 6 x 6
#> # Groups:   Label, Code [6]
#>   Label                        Type Proportion     N     C  Code
#>                                   
#> 1  203c                   wholefish    1.00000     1     1     c
#> 2  203c                       flesh    1.00000     2     2     a
#> 3  204a               flesh_formula    0.99999     8     8     a
#> 4  204a     fleshdelip_formuladelip    0.99999    10    10     b
#> 5  204a           formula_wholefish    0.99999    16    16     c
#> 6  204a formuladelip_wholefishdelip    0.99999    18    18     d

0 讨论(0)

查看其它3个回答