Here\'s a sample of booleans I have as part of a data.frame:
atest <- c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, TRUE, TRUE,
Here's one way to do it, using handy (but not widely-known/used) base functions:
> sequence(tabulate(cumsum(!atest)))
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1
To break it down:
> # return/repeat integer for each FALSE
> cumsum(!atest)
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
> # count the number of occurrences of each integer
> tabulate(cumsum(!atest))
[1] 10 10 1
> # create concatenated seq_len for each integer
> sequence(tabulate(cumsum(!atest)))
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1
Here is another approach using other familiar functions:
seq_along(atest) - cummax(seq_along(atest) * !atest) + 1L
Because it is all vectorized, it is noticeably faster than @Joshua's solution (if speed is of any concern):
f0 <- function(x) sequence(tabulate(cumsum(!x)))
f1 <- function(x) {i <- seq_along(x); i - cummax(i * !x) + 1L}
x <- rep(atest, 10000)
library(microbenchmark)
microbenchmark(f0(x), f1(x))
# Unit: milliseconds
# expr min lq median uq max neval
# f0(x) 19.386581 21.853194 24.511783 26.703705 57.20482 100
# f1(x) 3.518581 3.976605 5.962534 7.763618 35.95388 100
identical(f0(x), f1(x))
# [1] TRUE
Problems like these tend to work well with Rcpp
. Borrowing @flodel's code as a framework for benchmarking,
boolseq.cpp
-----------
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector boolSeq(LogicalVector x) {
int n = x.length();
IntegerVector output = no_init(n);
int counter = 1;
for (int i=0; i < n; ++i) {
if (!x[i]) {
counter = 1;
}
output[i] = counter;
++counter;
}
return output;
}
/*** R
x <- c(FALSE, sample( c(FALSE, TRUE), 1E5, TRUE ))
f0 <- function(x) sequence(tabulate(cumsum(!x)))
f1 <- function(x) {i <- seq_along(x); i - cummax(i * !x) + 1L}
library(microbenchmark)
microbenchmark(f0(x), f1(x), boolSeq(x), times=100)
stopifnot(identical(f0(x), f1(x)))
stopifnot(identical(f1(x), boolSeq(x)))
*/
sourceCpp
ing it gives me:
Unit: microseconds
expr min lq median uq max neval
f0(x) 18174.348 22163.383 24109.5820 29668.1150 78144.411 100
f1(x) 1498.871 1603.552 2251.3610 2392.1670 2682.078 100
boolSeq(x) 388.288 426.034 518.2875 571.4235 699.710 100
Less elegant, but pretty darn close to what you were writing with R code.