How to divide a set of overlapping ranges into non-overlapping ranges?

前端 未结 5 1134
伪装坚强ぢ
伪装坚强ぢ 2021-02-13 23:22

Let\'s say you have a set of ranges:

  • 0 - 100: \'a\'
  • 0 - 75: \'b\'
  • 95 - 150: \'c\'
  • 120 - 130: \'d\'

Obviously, these range

相关标签:
5条回答
  • 2021-02-13 23:33

    A similar answer to Edmunds, tested, including support for intervals like (1,1):

    class MultiSet(object):
        def __init__(self, intervals):
            self.intervals = intervals
            self.events = None
    
        def split_ranges(self):
            self.events = []
            for start, stop, symbol in self.intervals:
                self.events.append((start, True, stop, symbol))
                self.events.append((stop, False, start, symbol))
    
            def event_key(event):
                key_endpoint, key_is_start, key_other, _ = event
                key_order = 0 if key_is_start else 1
                return key_endpoint, key_order, key_other
    
            self.events.sort(key=event_key)
    
            current_set = set()
            ranges = []
            current_start = -1
    
            for endpoint, is_start, other, symbol in self.events:
                if is_start:
                    if current_start != -1 and endpoint != current_start and \
                           endpoint - 1 >= current_start and current_set:
                        ranges.append((current_start, endpoint - 1, current_set.copy()))
                    current_start = endpoint
                    current_set.add(symbol)
                else:
                    if current_start != -1 and endpoint >= current_start and current_set:
                        ranges.append((current_start, endpoint, current_set.copy()))
                    current_set.remove(symbol)
                    current_start = endpoint + 1
    
            return ranges
    
    
    if __name__ == '__main__':
        intervals = [
            (0, 100, 'a'), (0, 75, 'b'), (75, 80, 'd'), (95, 150, 'c'), 
            (120, 130, 'd'), (160, 175, 'e'), (165, 180, 'a')
        ]
        multiset = MultiSet(intervals)
        pprint.pprint(multiset.split_ranges())
    
    
    [(0, 74, {'b', 'a'}),
     (75, 75, {'d', 'b', 'a'}),
     (76, 80, {'d', 'a'}),
     (81, 94, {'a'}),
     (95, 100, {'c', 'a'}),
     (101, 119, {'c'}),
     (120, 130, {'d', 'c'}),
     (131, 150, {'c'}),
     (160, 164, {'e'}),
     (165, 175, {'e', 'a'}),
     (176, 180, {'a'})]
    
    0 讨论(0)
  • 2021-02-13 23:40

    Pseudocode:

    unusedRanges = [ (each of your ranges) ]
    rangesInUse = []
    usedRanges = []
    beginningBoundary = nil
    
    boundaries = [ list of all your ranges' start and end values, sorted ]
    resultRanges = []
    
    for (boundary in boundaries) {
        rangesStarting = []
        rangesEnding = []
    
        // determine which ranges begin at this boundary
        for (range in unusedRanges) {
            if (range.begin == boundary) {
                rangesStarting.add(range)
            }
        }
    
        // if there are any new ones, start a new range
        if (rangesStarting isn't empty) {
            if (beginningBoundary isn't nil) {
                // add the range we just passed
                resultRanges.add(beginningBoundary, boundary - 1, [collected values from rangesInUse])
            }
    
            // note that we are starting a new range
            beginningBoundary = boundary
    
            for (range in rangesStarting) {
                rangesInUse.add(range)
                unusedRanges.remove(range)
            }
        }
    
        // determine which ranges end at this boundary
        for (range in rangesInUse) {
            if (range.end == boundary) {
                rangesEnding.add(range)
            }
        }
    
        // if any boundaries are ending, stop the range
        if (rangesEnding isn't empty) {
            // add the range up to this boundary
            resultRanges.add(beginningBoundary, boundary, [collected values from rangesInUse]
    
            for (range in rangesEnding) {
                usedRanges.add(range)
                rangesInUse.remove(range)
            }
    
            if (rangesInUse isn't empty) {
                // some ranges didn't end; note that we are starting a new range
                beginningBoundary = boundary + 1
            }
            else {
                beginningBoundary = nil
            }
        }
    }
    

    Unit test:

    At the end, resultRanges should have the results you're looking for, unusedRanges and rangesInUse should be empty, beginningBoundary should be nil, and usedRanges should contain what unusedRanges used to contain (but sorted by range.end).

    0 讨论(0)
  • 2021-02-13 23:40

    I had the same question when writing a program to mix (partly overlapping) audio samples.

    What I did was add an "start event" and "stop event" (for each item) to a list, sort the list by time point, and then process it in order. You could do the same, except using an integer point instead of a time, and instead of mixing sounds you'd be adding symbols to the set corresponding to a range. Whether you'd generate empty ranges or just omit them would be optional.

    Edit Perhaps some code...

    # input = list of (start, stop, symbol) tuples
    points = [] # list of (offset, plus/minus, symbol) tuples
    for start,stop,symbol in input:
        points.append((start,'+',symbol))
        points.append((stop,'-',symbol))
    points.sort()
    
    ranges = [] # output list of (start, stop, symbol_set) tuples
    current_set = set()
    last_start = None
    for offset,pm,symbol in points:
        if pm == '+':
             if last_start is not None:
                 #TODO avoid outputting empty or trivial ranges
                 ranges.append((last_start,offset-1,current_set))
             current_set.add(symbol)
             last_start = offset
        elif pm == '-':
             # Getting a minus without a last_start is unpossible here, so not handled
             ranges.append((last_start,offset-1,current_set))
             current_set.remove(symbol)
             last_start = offset
    
    # Finish off
    if last_start is not None:
        ranges.append((last_start,offset-1,current_set))
    

    Totally untested, obviously.

    0 讨论(0)
  • 2021-02-13 23:52

    I'd say create a list of the endpoints and sort it, also index the list of ranges by starting and ending points. Then iterate through the list of sorted endpoints, and for each one, check the ranges to see which ones are starting/stopping at that point.

    This is probably better represented in code... if your ranges are represented by tuples:

    ranges = [(0,100,'a'),(0,75,'b'),(95,150,'c'),(120,130,'d')]
    endpoints = sorted(list(set([r[0] for r in ranges] + [r[1] for r in ranges])))
    start = {}
    end = {}
    for e in endpoints:
        start[e] = set()
        end[e] = set()
    for r in ranges:
        start[r[0]].add(r[2])
        end[r[1]].add(r[2])
    current_ranges = set()
    for e1, e2 in zip(endpoints[:-1], endpoints[1:]):
        current_ranges.difference_update(end[e1])
        current_ranges.update(start[e1])
        print '%d - %d: %s' % (e1, e2, ','.join(current_ranges))
    

    Although looking at this in retrospect, I'd be surprised if there wasn't a more efficient (or at least cleaner-looking) way to do it.

    0 讨论(0)
  • 2021-02-13 23:55

    What you describe is an example of set theory. For a general algorithm for computing unions, intersections, and differences of sets see:

    www.gvu.gatech.edu/~jarek/graphics/papers/04PolygonBooleansMargalit.pdf

    While the paper is targeted at graphics it is applicable to general set theory as well. Not exactly light reading material.

    0 讨论(0)
提交回复
热议问题