Select values within/outside of a set of intervals (ranges) R

问题

I've got some sort of index, like:

index <- 1:100

I've also got a list of "exclusion intervals" / ranges

exclude <- data.frame(start = c(5,50, 90), end = c(10,55, 95))

  start end
1     5  10
2    50  55
3    90  95

I'm looking for an efficient way (in R) to remove all the indexes that belong in the ranges in the exclude data frame

so the desired output would be:

1,2,3,4,  11,12,...,48,49,  56,57,...,88,89,  96,97,98,99,100

I could do this iteratively: go over every exclusion interval (using ddply) and iteratively remove indexes that fall in each interval. But is there a more efficient way (or function) that does this?

I'm using library(intervals) to calculate my intervals, I could not find a built-in function tha does this.

回答1:

We can use Map to get the sequence for the corresponding elements in 'start' 'end' columns, unlist to create a vector and use setdiff to get the values of 'index' that are not in the vector.

setdiff(index,unlist(with(exclude, Map(`:`, start, end))))
#[1]   1   2   3   4  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
#[20]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44
#[39]  45  46  47  48  49  56  57  58  59  60  61  62  63  64  65  66  67  68  69
#[58]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88
#[77]  89  96  97  98  99 100

Or we can use rep and then use setdiff.

i1 <- with(exclude, end-start) +1L
setdiff(index,with(exclude, rep(start, i1)+ sequence(i1)-1))

NOTE: Both the methods return the index position that needs to be excluded. In the above case, the original vector ('index') is a sequence so I used setdiff. If it contains random elements, use the position vector appropriately, i.e.

index[-unlist(with(exclude, Map(`:`, start, end)))]

index[setdiff(seq_along(index), unlist(with(exclude, 
                       Map(`:`, start, end))))]

回答2:

Another approach that looks valid could be:

starts = findInterval(index, exclude[["start"]])
ends = findInterval(index, exclude[["end"]])# + 1L) ##1 needs to be added to remove upper 
                                                        ##bounds from the 'index' too
index[starts != (ends + 1L)] ##a value above a lower bound and 
                                       ##below an upper is inside that interval

The main advantage here is that no vectors including all intervals' elements need to be created and, also, that it handles any set of values inside a particular interval; e.g.:

set.seed(101); x = round(runif(15, 1, 100), 3)
x
# [1] 37.848  5.339 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 93.232 46.057
x[findInterval(x, exclude[["start"]]) != (findInterval(x, exclude[["end"]]) + 1L)]
# [1] 37.848 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 46.057

回答3:

Another approach

> index[-do.call(c, lapply(1:nrow(exclude), function(x) exclude$start[x]:exclude$end[x]))]
 [1]   1   2   3   4  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
[25]  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  56  57  58  59  60
[49]  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
[73]  85  86  87  88  89  96  97  98  99 100

来源：https://stackoverflow.com/questions/34503718/select-values-within-outside-of-a-set-of-intervals-ranges-r

标签

subset

intervals