I find it hard to come up with a fast solution to the following problem:
I have a vector of observations, which indicates the time of observation of certain phenomena. <
I am quite sure somebody will approach a better pure-R solution, but my first try is to use only 1 loop as follows:
x <- c(0,0,0,1,0,1,1,0,0,0,-1,0,0,-1,-1,0,0,1,0,0)
last <- x[1]
for (i in seq_along(x)) {
if (x[i] == 0) x[i] <- last
else last <- x[i]
}
x
## [1] 0 0 0 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 1 1 1
The above easily translates to an effective C++ code:
Rcpp::cppFunction('
NumericVector elimzeros(NumericVector x) {
int n = x.size();
NumericVector y(n);
double last = x[0];
for (int i=0; i
Some benchmarks:
set.seed(123L)
x <- sample(c(-1,0,1), replace=TRUE, 100000)
# ...
microbenchmark::microbenchmark(
gagolews(x),
gagolews_Rcpp(x),
Roland(x),
AndreyShabalin_match(x),
AndreyShabalin_findInterval(x),
AndreyShabalin_cumsum(x),
unit="relative"
)
## Unit: relative
## expr min lq median uq max neval
## gagolews(x) 167.264538 163.172532 162.703810 171.186482 110.604258 100
## gagolews_Rcpp(x) 1.000000 1.000000 1.000000 1.000000 1.000000 100
## Roland(x) 33.817744 34.374521 34.544877 35.633136 52.825091 100
## AndreyShabalin_match(x) 45.217805 43.819050 44.105279 44.800612 58.375625 100
## AndreyShabalin_findInterval(x) 45.191419 43.832256 44.283284 45.094304 23.819259 100
## AndreyShabalin_cumsum(x) 8.701682 8.367212 8.413992 9.938748 5.676467 100