Collapse rows with overlapping ranges

前端 未结 3 2013
遇见更好的自我
遇见更好的自我 2020-11-30 09:35

I have a data.frame with start and end time:

ranges<- data.frame(start = c(65.72000,65.72187, 65.94312,73.75625,89.61625),stop = c(79.72187,79.72375,79.9         


        
相关标签:
3条回答
  • 2020-11-30 09:49

    You can try this:

    library(dplyr)
    ranges %>% 
           arrange(start) %>% 
           group_by(g = cumsum(cummax(lag(stop, default = first(stop))) < start)) %>% 
           summarise(start = first(start), stop = max(stop))
    
    # A tibble: 2 × 3
    #      g    start      stop
    #  <int>    <dbl>     <dbl>
    #1     0 65.72000  87.75625
    #2     1 89.61625 104.94062
    
    0 讨论(0)
  • 2020-11-30 09:50

    With base R and melt / unstack, let's add a few more dates to make the problem more interesting and generic:

    ranges<- data.frame(start = c(65.72000,65.72187, 65.94312,73.75625,89.61625,105.1,104.99),stop = c(79.72187,79.72375,79.94312,87.75625,104.94062,110.22,108.01))
    ranges
    #      start      stop
    #1  65.72000  79.72187
    #2  65.72187  79.72375
    #3  65.94312  79.94312
    #4  73.75625  87.75625
    #5  89.61625 104.94062
    #6 105.10000 110.22000
    #7 104.99000 108.01000
    
    library(reshape2)
    ranges <- melt(ranges)
    ranges <- ranges[order(ranges$value),]
    ranges
    #   variable     value
    #1     start  65.72000
    #2     start  65.72187
    #3     start  65.94312
    #4     start  73.75625
    #8      stop  79.72187
    #9      stop  79.72375
    #10     stop  79.94312
    #11     stop  87.75625
    #5     start  89.61625
    #12     stop 104.94062
    #7     start 104.99000
    #6     start 105.10000
    #14     stop 108.01000
    #13     stop 110.22000
    

    Now as can be seen from above, (with one reasonable assumption that we have a start value that is smallest of all the values and a stop value that is the largest of all the values), the problem reduces to finding the pattern stop followed by a start in consecutive rows and that will be the only points of interest for us (to find the overlapping ranges) apart from the first and the last row. The following code achieves that:

    indices <- intersect(which(ranges$variable=='start')-1, which(ranges$variable=='stop'))
    unstack(ranges[c(1, sort(c(indices, indices+1)), nrow(ranges)),], value~variable)
    #      start      stop
    #1  65.72000  87.75625
    #2  89.61625 104.94062
    #3 104.99000 110.22000
    
    0 讨论(0)
  • 2020-11-30 09:59

    Here is a data.table solution

    library(data.table)
    setDT(ranges)
    ranges[, .(start=min(start), stop=max(stop)),
           by=.(group=cumsum(c(1, tail(start, -1) > head(stop, -1))))]
       group    start      stop
    1:     1 65.72000  87.75625
    2:     2 89.61625 104.94062
    

    Here, groups are constructed by checking if the previous start is greater than stop and then using cumsum. within each group, minimum of start and maximum of stop are calculated.

    0 讨论(0)
提交回复
热议问题