Below is my dataframe, I\'d like to get the \"yes\" column. I can\'t seem to get the cumsum
to reset when it hits the 0 based on the \"value\" field by \"id\". Th
I think the answer by Imo could be an advertisement for the ta.table package (as if it needed yet another one for such a great package.) But I also think that base-R solutions should be attempted, so here's mine. It uses ave
(which requires the FUN argument to be named) and does the cumsum operation twice, the first application is to create a grouping vector, and the second application creates the sequence. The second application could have also been seq.int
but it would have seemed a bit clumsy since it would have needed to be function(x)seq.int(0,length(x)-1)
since the default call to seq.int starts from 1 rather than 0.
test$yes2 <- ave(test$value, cumsum(test$value==0), FUN=cumsum)
> test
id value yes yes2
1 1 1 1 1
2 1 1 2 2
3 1 0 0 0
4 1 1 1 1
5 2 1 1 2
6 2 1 2 3
7 2 1 3 4
8 2 1 4 5
9 3 0 0 0
10 3 1 1 1
11 3 1 2 2
12 3 0 0 0
13 4 1 1 1
14 4 1 2 2
15 4 0 0 0
16 4 0 0 0
You can create a new by variable on the fly like this:
test[, wrong := cumsum(value), by=.(id, tempID=cumsum(value==0))]
test
id value correct wrong
1: 1 1 1 1
2: 1 1 2 2
3: 1 0 0 0
4: 1 1 1 1
5: 2 1 1 1
6: 2 1 2 2
7: 2 1 3 3
8: 2 1 4 4
9: 3 0 0 0
10: 3 1 1 1
11: 3 1 2 2
12: 3 0 0 0
13: 4 1 1 1
14: 4 1 2 2
15: 4 0 0 0
16: 4 0 0 0
Note that test <-
is not necessary here, as :=
will update the data.table by reference.