Cumsum with reset when 0 is encountered and by groups

后端 未结 2 1062
生来不讨喜
生来不讨喜 2021-01-21 14:55

Below is my dataframe, I\'d like to get the \"yes\" column. I can\'t seem to get the cumsum to reset when it hits the 0 based on the \"value\" field by \"id\". Th

相关标签:
2条回答
  • I think the answer by Imo could be an advertisement for the ta.table package (as if it needed yet another one for such a great package.) But I also think that base-R solutions should be attempted, so here's mine. It uses ave (which requires the FUN argument to be named) and does the cumsum operation twice, the first application is to create a grouping vector, and the second application creates the sequence. The second application could have also been seq.int but it would have seemed a bit clumsy since it would have needed to be function(x)seq.int(0,length(x)-1) since the default call to seq.int starts from 1 rather than 0.

    test$yes2 <- ave(test$value, cumsum(test$value==0), FUN=cumsum)
    
    > test
       id value yes yes2
    1   1     1   1    1
    2   1     1   2    2
    3   1     0   0    0
    4   1     1   1    1
    5   2     1   1    2
    6   2     1   2    3
    7   2     1   3    4
    8   2     1   4    5
    9   3     0   0    0
    10  3     1   1    1
    11  3     1   2    2
    12  3     0   0    0
    13  4     1   1    1
    14  4     1   2    2
    15  4     0   0    0
    16  4     0   0    0
    
    0 讨论(0)
  • 2021-01-21 15:30

    You can create a new by variable on the fly like this:

    test[, wrong := cumsum(value), by=.(id, tempID=cumsum(value==0))]
    test
        id value correct wrong
     1:  1     1       1     1
     2:  1     1       2     2
     3:  1     0       0     0
     4:  1     1       1     1
     5:  2     1       1     1
     6:  2     1       2     2
     7:  2     1       3     3
     8:  2     1       4     4
     9:  3     0       0     0
    10:  3     1       1     1
    11:  3     1       2     2
    12:  3     0       0     0
    13:  4     1       1     1
    14:  4     1       2     2
    15:  4     0       0     0
    16:  4     0       0     0
    

    Note that test <- is not necessary here, as := will update the data.table by reference.

    0 讨论(0)
提交回复
热议问题