Aggregate rows by shared values in a variable

天大地大妈咪最大 提交于 2020-02-20 06:04:09

问题


I have a somewhat dumb R question. If I have a matrix (or dataframe, whichever is easier to work with) like:

Year  Match
2008   1808
2008 137088
2008      1
2008  56846
2007   2704
2007 169876
2007  75750
2006   2639
2006 193990
2006      2

And I wanted to sum each of the match counts for the years (so, e.g. the 2008 row would be 2008 195743, how would I go about doing this? I've got a few solutions in my head but they are all needlessly complicated and R tends to have some much easier solution tucked away somewhere.

You can generate the same matrix above with:

structure(c(2008L, 2008L, 2008L, 2008L, 2007L, 2007L, 2007L, 
2006L, 2006L, 2006L, 1808L, 137088L, 1L, 56846L, 2704L, 169876L, 
75750L, 2639L, 193990L, 2L), .Dim = c(10L, 2L), .Dimnames = list(
NULL, c("Year", "Match")))

Thanks for any help you can offer.


回答1:


aggregate(x = df$Match, by = list(df$Year), FUN = sum), assuming df is your data frame above.




回答2:


You may also want to use 'ddply' function from 'plyr' package.

# install plyr package
install.packages('plyr')
library(plyr)
# creating your data.frame
foo <- as.data.frame(structure(c(2008L, 2008L, 2008L, 2008L, 2007L, 2007L, 2007L, 
            2006L, 2006L, 2006L, 1808L, 137088L, 1L, 56846L, 2704L, 169876L, 
            75750L, 2639L, 193990L, 2L), .Dim = c(10L, 2L), .Dimnames = list(
              NULL, c("Year", "Match"))))

# here's what you're looking for
ddply(foo,.(Year),numcolwise(sum))

  Year  Match
1 2006 196631
2 2007 248330
3 2008 195743

By the way, the total sum for 2008 should be 195743 (1808+137088+1+56846) instead of 138897 you forgot add 56846 up.




回答3:


As it is explained above, you can use aggregate to do it as follows. but in a much simpler way

aggregate(. ~ Year, df, sum)
#  Year  Match
#1 2006 196631
#2 2007 248330
#3 2008 195743

You can also use the Dplyr to solve this as follows

library(dplyr)
df %>% group_by(Year) %>% summarise(Match = sum(Match))
#  Year  Match
#  (int)  (int)
#1  2008 195743
#2  2007 248330
#3  2006 196631


来源:https://stackoverflow.com/questions/10202480/aggregate-rows-by-shared-values-in-a-variable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!