问题
I have following data.frame (df)
ID1 ID2 Col1 Col2 Col3 Grp
A B 1 3 6 G1
C D 3 5 7 G1
E F 4 5 7 G2
G h 5 6 8 G2
What I would like to achieve is the following: - group by Grp, easy - and then summarize so that for each group I sum the columns and create the columns with strings with all ID1s and ID2s
It would be something like this:
df %>%
group_by(Grp) %>%
summarize(ID1s=toString(ID1), ID2s=toString(ID2), Col1=sum(Col1), Col2=sum(Col2), Col3=sum(Col3))
Everything is fine whae Iknow the number of the columns (Col1, Col2, Col3), however I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.
Is there a way to do it in dplyr.
回答1:
I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.
You can overwrite the ID columns first and then group by them as well:
DF %>%
group_by(Grp) %>% mutate_each(funs(. %>% unique %>% sort %>% toString), ID1, ID2) %>%
group_by(ID1, ID2, add=TRUE) %>% summarise_each(funs(sum))
# Source: local data frame [2 x 6]
# Groups: Grp, ID1 [?]
#
# Grp ID1 ID2 Col1 Col2 Col3
# (chr) (chr) (chr) (int) (int) (int)
# 1 G1 A, C B, D 4 8 13
# 2 G2 E, G F, h 9 11 15
I think you'll want to uniqify and sort before collapsing to a string, so I've added those steps.
回答2:
Using the data table you could try the following:
setDT(df)
sd_cols=3:(ncol(df)-1)
merge(df[ ,.(toString(ID1), toString(ID2)), by = Grp], df[ , c(-1,-2), with = F][ , lapply(.SD, sum), by = Grp],by = "Grp")
来源:https://stackoverflow.com/questions/38722829/summarizing-unknown-number-of-column-in-r-using-dplyr