问题
I have a somewhat dumb R question. If I have a matrix (or dataframe, whichever is easier to work with) like:
Year Match
2008 1808
2008 137088
2008 1
2008 56846
2007 2704
2007 169876
2007 75750
2006 2639
2006 193990
2006 2
And I wanted to sum each of the match counts for the years (so, e.g. the 2008 row would be 2008 195743
, how would I go about doing this? I've got a few solutions in my head but they are all needlessly complicated and R tends to have some much easier solution tucked away somewhere.
You can generate the same matrix above with:
structure(c(2008L, 2008L, 2008L, 2008L, 2007L, 2007L, 2007L,
2006L, 2006L, 2006L, 1808L, 137088L, 1L, 56846L, 2704L, 169876L,
75750L, 2639L, 193990L, 2L), .Dim = c(10L, 2L), .Dimnames = list(
NULL, c("Year", "Match")))
Thanks for any help you can offer.
回答1:
aggregate(x = df$Match, by = list(df$Year), FUN = sum)
, assuming df
is your data frame above.
回答2:
You may also want to use 'ddply' function from 'plyr' package.
# install plyr package
install.packages('plyr')
library(plyr)
# creating your data.frame
foo <- as.data.frame(structure(c(2008L, 2008L, 2008L, 2008L, 2007L, 2007L, 2007L,
2006L, 2006L, 2006L, 1808L, 137088L, 1L, 56846L, 2704L, 169876L,
75750L, 2639L, 193990L, 2L), .Dim = c(10L, 2L), .Dimnames = list(
NULL, c("Year", "Match"))))
# here's what you're looking for
ddply(foo,.(Year),numcolwise(sum))
Year Match
1 2006 196631
2 2007 248330
3 2008 195743
By the way, the total sum for 2008 should be 195743 (1808+137088+1+56846) instead of 138897 you forgot add 56846 up.
回答3:
As it is explained above, you can use aggregate to do it as follows. but in a much simpler way
aggregate(. ~ Year, df, sum)
# Year Match
#1 2006 196631
#2 2007 248330
#3 2008 195743
You can also use the Dplyr to solve this as follows
library(dplyr)
df %>% group_by(Year) %>% summarise(Match = sum(Match))
# Year Match
# (int) (int)
#1 2008 195743
#2 2007 248330
#3 2006 196631
来源:https://stackoverflow.com/questions/10202480/aggregate-rows-by-shared-values-in-a-variable