问题
Suppose I have the data:
dat = data.table(A=c(1,2,3,1,4,5,1,2,3),B=c(1,1,1,NA,1,NA,2,NA,2),
C=c(1,12,24.2,251,2,1,2,3,-1),D=c(1,1,1,1,1,2,2,2,2))
Which looks like:
> dat
A B C D
1: 1 1 1.0 1
2: 2 1 12.0 1
3: 3 1 24.2 1
4: 1 NA 251.0 1
5: 4 1 2.0 1
6: 5 NA 1.0 2
7: 1 2 2.0 2
8: 2 NA 3.0 2
9: 3 2 -1.0 2
And I want the desired output to, by group D, track the cumulative sum of A for whichever element occurs in B. So the output should be:
> dat
A B C D cumsum
1: 1 1 1.0 1 1
2: 2 1 12.0 1 1
3: 3 1 24.2 1 1
4: 1 NA 251.0 1 NA
5: 4 1 2.0 1 252
6: 5 NA 1.0 2 NA
7: 1 2 2.0 2 0
8: 2 NA 3.0 2 NA
9: 3 2 -1.0 2 3
My original attempt was to use this style solution but with a groupby (which I like because it is very efficient I believe and should scale to a large number of groups):
dat[, rn := 1:.N,by=D][, cs := cumsum(C), .(A,D)];
dat[, cumsum := 0][ !is.na(B), cumsum := dat[.SD, on=.(A=B, rn,D), allow.cartesian=TRUE, roll=TRUE, x.cs]]
However, this produced an incorrect result:
> dat
A B C D rn cs cumsum
1: 1 1 1.0 1 1 1.0 1
2: 2 1 12.0 1 2 12.0 NA
3: 3 1 24.2 1 3 24.2 NA
4: 1 NA 251.0 1 4 252.0 0
5: 4 1 2.0 2 1 2.0 1
6: 5 NA 1.0 2 2 1.0 0
7: 1 2 2.0 2 3 2.0 NA
8: 2 NA 3.0 2 4 3.0 0
9: 3 2 -1.0 2 5 -1.0 NA
Can someone show me where I went wrong?
来源:https://stackoverflow.com/questions/57447378/non-equi-self-join-by-group-in-data-table