I\'m using a data.table in R to store a time series. I want to return a subset such that successive rows for the selected times are at least N seconds apart from the last ro
Here are a couple ways to use rolling joins to find the set of rows, w
, in your subset:
t_plus = 5
# one join per row visited
w <- c()
nxt <- 1L
while(!is.na(nxt)){
w <- c(w, nxt)
nxt <- x[.(t[nxt]+t_plus), on=.(t), roll=-Inf, which=TRUE]
}
# join once on all rows
w0 <- x[.(t+5), on=.(t), roll=-Inf, which=TRUE]
w <- c()
nxt <- 1L
while (!is.na(nxt)){
w <- c(w, nxt)
nxt <- w0[nxt]
}
Then you can subset like x[w]
.
Comments
In principle, there could be other subsets that satisfy the OP's condition "at least 5 seconds apart"; this is just the one found by iterating from the first row forward.
The second way is based on @DavidArenburg's answer to the Q&A Henrik linked above. Although the question seems the same, I couldn't get that approach to work fully here.
Generally, it's a bad idea to grow things in a loop in R (like I'm doing with w
here). If you're running into performance problems, that might be a good area to improve in this code.