Select values within/outside of a set of intervals (ranges) R

Deadly 提交于 2019-12-23 22:18:38

问题


I've got some sort of index, like:

index <- 1:100

I've also got a list of "exclusion intervals" / ranges

exclude <- data.frame(start = c(5,50, 90), end = c(10,55, 95))

  start end
1     5  10
2    50  55
3    90  95

I'm looking for an efficient way (in R) to remove all the indexes that belong in the ranges in the exclude data frame

so the desired output would be:

1,2,3,4,  11,12,...,48,49,  56,57,...,88,89,  96,97,98,99,100

I could do this iteratively: go over every exclusion interval (using ddply) and iteratively remove indexes that fall in each interval. But is there a more efficient way (or function) that does this?

I'm using library(intervals) to calculate my intervals, I could not find a built-in function tha does this.


回答1:


We can use Map to get the sequence for the corresponding elements in 'start' 'end' columns, unlist to create a vector and use setdiff to get the values of 'index' that are not in the vector.

setdiff(index,unlist(with(exclude, Map(`:`, start, end))))
#[1]   1   2   3   4  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
#[20]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44
#[39]  45  46  47  48  49  56  57  58  59  60  61  62  63  64  65  66  67  68  69
#[58]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88
#[77]  89  96  97  98  99 100

Or we can use rep and then use setdiff.

i1 <- with(exclude, end-start) +1L
setdiff(index,with(exclude, rep(start, i1)+ sequence(i1)-1))

NOTE: Both the methods return the index position that needs to be excluded. In the above case, the original vector ('index') is a sequence so I used setdiff. If it contains random elements, use the position vector appropriately, i.e.

index[-unlist(with(exclude, Map(`:`, start, end)))]

or

index[setdiff(seq_along(index), unlist(with(exclude, 
                       Map(`:`, start, end))))]



回答2:


Another approach that looks valid could be:

starts = findInterval(index, exclude[["start"]])
ends = findInterval(index, exclude[["end"]])# + 1L) ##1 needs to be added to remove upper 
                                                        ##bounds from the 'index' too
index[starts != (ends + 1L)] ##a value above a lower bound and 
                                       ##below an upper is inside that interval

The main advantage here is that no vectors including all intervals' elements need to be created and, also, that it handles any set of values inside a particular interval; e.g.:

set.seed(101); x = round(runif(15, 1, 100), 3)
x
# [1] 37.848  5.339 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 93.232 46.057
x[findInterval(x, exclude[["start"]]) != (findInterval(x, exclude[["end"]]) + 1L)]
# [1] 37.848 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 46.057



回答3:


Another approach

> index[-do.call(c, lapply(1:nrow(exclude), function(x) exclude$start[x]:exclude$end[x]))]
 [1]   1   2   3   4  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
[25]  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  56  57  58  59  60
[49]  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
[73]  85  86  87  88  89  96  97  98  99 100


来源:https://stackoverflow.com/questions/34503718/select-values-within-outside-of-a-set-of-intervals-ranges-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!