Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

后端 未结 3 1581
生来不讨喜
生来不讨喜 2021-01-11 23:17

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because th

3条回答
  •  北荒
    北荒 (楼主)
    2021-01-11 23:54

    Here's a tidyverse solution that keeps your group_by statement the same. The key is to use mutate_if for each variable type first (i.e., numeric, character), then get distinct rows.


    library(tidyverse)
    #> Loading tidyverse: ggplot2
    #> Loading tidyverse: tibble
    #> Loading tidyverse: tidyr
    #> Loading tidyverse: readr
    #> Loading tidyverse: purrr
    #> Loading tidyverse: dplyr
    #> Conflicts with tidy packages ----------------------------------------------
    #> filter(): dplyr, stats
    #> lag():    dplyr, stats
    
    Label <- c("203c", "203c", "204a", "204a", "204a", "204a", "204a", "204a",
      "204a", "204a")
    Type <- c("wholefish", "flesh", "flesh", "fleshdelip", "formula", "formuladelip",
      "formula", "formuladelip", "wholefish", "wholefishdelip")
    Proportion <- c(1, 1, 0.67714, 0.67714, 0.32285, 0.32285, 0.32285, 0.32285,
      0.67714, 0.67714)
    N <- (1:10)
    C <- (1:10)
    Code <- c("c", "a", "a", "b", "a", "b", "c", "d", "c", "d")
    
    df <- data_frame(Label, Type, Proportion, N, C, Code)
    df
    #> # A tibble: 10 x 6
    #>    Label           Type Proportion     N     C  Code
    #>                       
    #>  1  203c      wholefish    1.00000     1     1     c
    #>  2  203c          flesh    1.00000     2     2     a
    #>  3  204a          flesh    0.67714     3     3     a
    #>  4  204a     fleshdelip    0.67714     4     4     b
    #>  5  204a        formula    0.32285     5     5     a
    #>  6  204a   formuladelip    0.32285     6     6     b
    #>  7  204a        formula    0.32285     7     7     c
    #>  8  204a   formuladelip    0.32285     8     8     d
    #>  9  204a      wholefish    0.67714     9     9     c
    #> 10  204a wholefishdelip    0.67714    10    10     d
    
    df %>%
      group_by(Label, Code) %>%
      mutate_if(is.numeric, sum) %>%
      mutate_if(is.character, funs(paste(unique(.), collapse = "_"))) %>%
      distinct()
    #> # A tibble: 6 x 6
    #> # Groups:   Label, Code [6]
    #>   Label                        Type Proportion     N     C  Code
    #>                                   
    #> 1  203c                   wholefish    1.00000     1     1     c
    #> 2  203c                       flesh    1.00000     2     2     a
    #> 3  204a               flesh_formula    0.99999     8     8     a
    #> 4  204a     fleshdelip_formuladelip    0.99999    10    10     b
    #> 5  204a           formula_wholefish    0.99999    16    16     c
    #> 6  204a formuladelip_wholefishdelip    0.99999    18    18     d
    

提交回复
热议问题