non equi self join by group in data table

喜欢而已 提交于 2019-12-24 06:26:24

问题


Suppose I have the data:

dat = data.table(A=c(1,2,3,1,4,5,1,2,3),B=c(1,1,1,NA,1,NA,2,NA,2),
                 C=c(1,12,24.2,251,2,1,2,3,-1),D=c(1,1,1,1,1,2,2,2,2))

Which looks like:

> dat
   A  B     C D
1: 1  1   1.0 1
2: 2  1  12.0 1
3: 3  1  24.2 1
4: 1 NA 251.0 1
5: 4  1   2.0 1
6: 5 NA   1.0 2
7: 1  2   2.0 2
8: 2 NA   3.0 2
9: 3  2  -1.0 2

And I want the desired output to, by group D, track the cumulative sum of A for whichever element occurs in B. So the output should be:

> dat
   A  B     C D cumsum
1: 1  1   1.0 1      1
2: 2  1  12.0 1      1
3: 3  1  24.2 1      1
4: 1 NA 251.0 1     NA
5: 4  1   2.0 1    252
6: 5 NA   1.0 2     NA
7: 1  2   2.0 2      0
8: 2 NA   3.0 2     NA
9: 3  2  -1.0 2      3

My original attempt was to use this style solution but with a groupby (which I like because it is very efficient I believe and should scale to a large number of groups):

 dat[, rn := 1:.N,by=D][, cs := cumsum(C), .(A,D)]; 
dat[, cumsum := 0][ !is.na(B), cumsum := dat[.SD, on=.(A=B, rn,D), allow.cartesian=TRUE, roll=TRUE, x.cs]]

However, this produced an incorrect result:

> dat
   A  B     C D rn    cs cumsum
1: 1  1   1.0 1  1   1.0      1
2: 2  1  12.0 1  2  12.0     NA
3: 3  1  24.2 1  3  24.2     NA
4: 1 NA 251.0 1  4 252.0      0
5: 4  1   2.0 2  1   2.0      1
6: 5 NA   1.0 2  2   1.0      0
7: 1  2   2.0 2  3   2.0     NA
8: 2 NA   3.0 2  4   3.0      0
9: 3  2  -1.0 2  5  -1.0     NA

Can someone show me where I went wrong?

来源:https://stackoverflow.com/questions/57447378/non-equi-self-join-by-group-in-data-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!