问题
From a sequence of TRUEs and falses, I wanted to make a function that returns TRUE whether there is a series of at least n1
TRUEs somewhere in the sequence. Here is that function:
fun_1 = function(TFvec, n1){
nbT = 0
solution = -1
for (i in 1:length(x)){
if (x[i]){
nbT = nbT + 1
if (nbT == n1){
return(T)
break
}
} else {
nbT = 0
}
}
return (F)
}
Test:
x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_1(x,3) # TRUE
fun_1(x,4) # FALSE
Then, I needed a function that returns TRUE if in a given list boolean vector, there is a series of at least n1
TRUEs wrapped by at least two series (one on each side) of n2
falses. Here the function:
fun_2 = function(TFvec, n1, n2){
if (n2 == 0){
fun_1(TFvec, n2)
}
nbFB = 0
nbFA = 0
nbT = 0
solution = -1
last = F
for (i in 1:length(TFvec)){
if(TFvec[i]){
nbT = nbT + 1
if (nbT == n1 & nbFB >= n2){
solution = i-n1+1
}
last = T
} else {
if (last){
nbFB = 0
nbFA = 0
}
nbFB = nbFB + 1
nbFA = nbFA + 1
nbT = 0
if (nbFA == n2 & solution!=-1){
return(T)
}
last = F
}
}
return(F)
}
It is maybe not a very efficient function though! And I haven't tested it 100 times but it looks like it works fine!
Test:
x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_2(x, 3, 2) # TRUE
fun_2(x, 3, 3) # FALSE
Now, believe it or not, I'd like to make a function (fun_3
) that returns TRUE if in the boolean vector there is a (at least) series of at least n1
TRUEs wrapped in between (at least) two (one on each side) series of n2
falses where the whole thing (the three series) are wrapped in between (at least) two (one on each side) series of n3
TRUEs. And as I am afraid to have to bring this problem even further, I am asking here for help to create a function fun_n
in which we enter two arguments TFvec
and list_n
where list_n
is a list of n
of any length.
Can you help me to create the function fun_n
?
回答1:
For convenience, record the length of the number of thresholds
n = length(list_n)
Represent the vector of TRUE and FALSE as a run-length encoding, remembering the length of each run for convenience
r = rle(TFvec); l = r$length
Find possible starting locations
idx = which(l >= list_n[1] & r$value)
Make sure the starting locations are embedded enough to satisfy all tests
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
Then check that lengths of successively remote runs are consistent with the condition, keeping only those starting points that are
for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break # no solution
thresh = list_n[i + 1]
test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
idx = idx[test]
}
If there are any values left in idx
, then these are the indexes into the rle satisfying the condition; the starting point(s) in the initial vector are cumsum(l)[idx - 1] + 1
.
Combined:
runfun = function(TFvec, list_n) {
## setup
n = length(list_n)
r = rle(TFvec); l = r$length
## initial condition
idx = which(l >= list_n[1] & r$value)
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
## adjacent conditions
for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break # no solution
thresh = list_n[i + 1]
test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
idx = idx[test]
}
## starts = cumsum(l)[idx - 1] + 1
## any luck?
length(idx) != 0
}
This is fast and allows for runs >= the threshold, as stipulated in the question; for example
x = sample(c(TRUE, FALSE), 1000000, TRUE)
system.time(runfun(x, rep(2, 5)))
completes in less than 1/5th of a second.
A fun generalization allows for flexible condition, e.g., runs of exactly list_n
, as in the rollapply solution
runfun = function(TFvec, list_n, cond=`>=`) {
## setup
n = length(list_n)
r = rle(TFvec); l = r$length
## initial condition
idx = which(cond(l, list_n[1]) & r$value)
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
## adjacent conditions
for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break # no solution
thresh = list_n[i + 1]
test = cond(l[idx + i], thresh) & cond(l[idx - i], thresh)
idx = idx[test]
}
## starts = cumsum(l)[idx - 1] + 1
## any luck?
length(idx) != 0
}
回答2:
Create a template, tpl
of zeros and ones, convert it to a regex pattern pat
. Convert x
to a single string of zeros and ones and use grepl
to match pat
to it. No packages are used.
fun_n <- function(x, lens) {
n <- length(lens)
reps <- c(rev(lens), lens[-1])
TF <- if (n == 1) 1 else if (n %% 2) 1:0 else 0:1
tpl <- paste0(rep(TF, length = n), "{", reps, ",}")
pat <- paste(tpl, collapse = "")
grepl(pat, paste(x + 0, collapse = ""))
}
# test
x <- c(F, T, T, F, F, T, T, T, F, F, T, T, T, F)
fun_n(x, 3:1)
## TRUE
fun_n(x, 1:3)
## FALSE
fun_n(x, 100)
## FALSE
fun_n(x, 3)
## TRUE
fun_n(c(F, T, F), c(1, 1))
## [1] TRUE
fun_n(c(F, T, T, F), c(1, 1))
## [1] TRUE
Run time is not as fast as runfun
on the example below but still quite fast running 10,000 instances of the example shown in slightly over 2 seconds on my laptop. Also the code is relatively short in length and loop-free.
> library(rbenchmark)
> benchmark(runfun(x, 1:3), fun_n(x, 1:3), replications = 10000)[1:4]
test replications elapsed relative
2 fun_n(x, 1:3) 10000 2.29 1.205
1 runfun(x, 1:3) 10000 1.90 1.000
来源:https://stackoverflow.com/questions/28756848/finding-n1-trues-wrapped-in-between-two-n2-falses-the-whole-thing-wrapped-i