My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because th
Here's a tidyverse solution that keeps your group_by
statement the same. The key is to use mutate_if
for each variable type first (i.e., numeric, character), then get distinct rows.
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag(): dplyr, stats
Label <- c("203c", "203c", "204a", "204a", "204a", "204a", "204a", "204a",
"204a", "204a")
Type <- c("wholefish", "flesh", "flesh", "fleshdelip", "formula", "formuladelip",
"formula", "formuladelip", "wholefish", "wholefishdelip")
Proportion <- c(1, 1, 0.67714, 0.67714, 0.32285, 0.32285, 0.32285, 0.32285,
0.67714, 0.67714)
N <- (1:10)
C <- (1:10)
Code <- c("c", "a", "a", "b", "a", "b", "c", "d", "c", "d")
df <- data_frame(Label, Type, Proportion, N, C, Code)
df
#> # A tibble: 10 x 6
#> Label Type Proportion N C Code
#>
#> 1 203c wholefish 1.00000 1 1 c
#> 2 203c flesh 1.00000 2 2 a
#> 3 204a flesh 0.67714 3 3 a
#> 4 204a fleshdelip 0.67714 4 4 b
#> 5 204a formula 0.32285 5 5 a
#> 6 204a formuladelip 0.32285 6 6 b
#> 7 204a formula 0.32285 7 7 c
#> 8 204a formuladelip 0.32285 8 8 d
#> 9 204a wholefish 0.67714 9 9 c
#> 10 204a wholefishdelip 0.67714 10 10 d
df %>%
group_by(Label, Code) %>%
mutate_if(is.numeric, sum) %>%
mutate_if(is.character, funs(paste(unique(.), collapse = "_"))) %>%
distinct()
#> # A tibble: 6 x 6
#> # Groups: Label, Code [6]
#> Label Type Proportion N C Code
#>
#> 1 203c wholefish 1.00000 1 1 c
#> 2 203c flesh 1.00000 2 2 a
#> 3 204a flesh_formula 0.99999 8 8 a
#> 4 204a fleshdelip_formuladelip 0.99999 10 10 b
#> 5 204a formula_wholefish 0.99999 16 16 c
#> 6 204a formuladelip_wholefishdelip 0.99999 18 18 d