Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

后端 未结 3 1580
生来不讨喜
生来不讨喜 2021-01-11 23:17

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because th

3条回答
  •  北恋
    北恋 (楼主)
    2021-01-12 00:05

    Here's the data.table solution, I'm assuming you want the mean() of Proportion, since these grouped proportions are likely not additive.

    setDT(df)
    
    df[, .(Type =paste(Type,collapse="_"), 
      Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]
      [order(Label)]
    
       Label Code                        Type Proportion  N  C
    1:  203c    c                   wholefish   1.000000  1  1
    2:  203c    a                       flesh   1.000000  2  2
    3:  204a    a               flesh_formula   0.499995  8  8
    4:  204a    b     fleshdelip_formuladelip   0.499995 10 10
    5:  204a    c           formula_wholefish   0.499995 16 16
    6:  204a    d formuladelip_wholefishdelip   0.499995 18 18
    

    I'm not sure this is the cleanest dplyr solution, but it works:

    df %>% group_by(Label, Code) %>% 
      mutate(Type = paste(Type,collapse="_")) %>% 
      group_by(Label,Type,Code) %>% 
      summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))
    

    Note the key here is to re-group once you create the combined Type column.

       Label                        Type   Code     N     C Proportion
                                      
    1   203c                       flesh      a     2     2   1.000000
    2   203c                   wholefish      c     1     1   1.000000
    3   204a               flesh_formula      a     8     8   0.499995
    4   204a     fleshdelip_formuladelip      b    10    10   0.499995
    5   204a           formula_wholefish      c    16    16   0.499995
    6   204a formuladelip_wholefishdelip      d    18    18   0.499995
    

提交回复
热议问题