My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because th
Here's the data.table
solution, I'm assuming you want the mean()
of Proportion, since these grouped proportions are likely not additive.
setDT(df)
df[, .(Type =paste(Type,collapse="_"),
Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]
[order(Label)]
Label Code Type Proportion N C
1: 203c c wholefish 1.000000 1 1
2: 203c a flesh 1.000000 2 2
3: 204a a flesh_formula 0.499995 8 8
4: 204a b fleshdelip_formuladelip 0.499995 10 10
5: 204a c formula_wholefish 0.499995 16 16
6: 204a d formuladelip_wholefishdelip 0.499995 18 18
I'm not sure this is the cleanest dplyr
solution, but it works:
df %>% group_by(Label, Code) %>%
mutate(Type = paste(Type,collapse="_")) %>%
group_by(Label,Type,Code) %>%
summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))
Note the key here is to re-group once you create the combined Type
column.
Label Type Code N C Proportion
1 203c flesh a 2 2 1.000000
2 203c wholefish c 1 1 1.000000
3 204a flesh_formula a 8 8 0.499995
4 204a fleshdelip_formuladelip b 10 10 0.499995
5 204a formula_wholefish c 16 16 0.499995
6 204a formuladelip_wholefishdelip d 18 18 0.499995