问题
Suppose data looks like
group1 group2 num
A sg 1
A sh 2
A sg 4
B at 3
B al 7
a <- cumsum(data[,"num"]) # 1 3 7 10 17
I need something accumulated by groups. In reality,I have multiple columns as grouping indicators. I want to get the accumulated sum by the subgroup I define.
E.g
If I group by group1
only, then the output should be
group1 sum
A 1
A 3
A 7
B 3
B 10
If I group by two variables group1,group2
then the output is
group1 group2 sum
A sg 1
A sh 2
A sg 5
B at 3
B al 7
回答1:
library(data.table)
data <- data.table(group1=c('A','A','A','B','B'),sum=c(1,2,4,3,7))
data[,list(cumsum = cumsum(sum)),by=list(group1)]
回答2:
In addition to using data.table
, tapply
in base R works fine for both of these cases:
dta <- read.table(text="
group1 group2 num
A sg 1
A sh 2
A sg 4
B at 3
B al 7", header=TRUE)
dta$cumsum <- do.call(c, tapply(dta$num, dta$group1, FUN=cumsum))
Calculating the cumulative sum by two groups requires some reordering:
dta <- dta[order(dta$group1, dta$group2, dta$num),]
dta$cumsum2 <- do.call(c, tapply(dta$num,
paste0(dta$group1, dta$group2),
FUN=cumsum))
dta
group1 group2 num cumsum cumsum2
1 A sg 1 1 1
3 A sg 4 7 5
2 A sh 2 3 2
5 B al 7 10 7
4 B at 3 3 3
And if you need the original order back:
dta[as.numeric(rownames(dta)),]
group1 group2 num cumsum cumsum2
1 A sg 1 1 1
2 A sh 2 3 2
3 A sg 4 7 5
4 B at 3 3 3
5 B al 7 10 7
来源:https://stackoverflow.com/questions/30277087/cumsum-by-group