How to count the presence of a set of numbers in a set of intervals efficiently

前端 未结 3 1011
半阙折子戏
半阙折子戏 2021-01-05 11:50

The input parameters are a list of tuples representing the intervals and a list of integers. The goal is to write a function that counts the number of intervals each integer

相关标签:
3条回答
  • 2021-01-05 12:21

    Put your integers, start points, and end points in a single list of pairs. Make the first element of each pair the value of the integer, start point, or end point, and the second element of each pair be 0, -1, or 1 depending on whether it's an integer, start point, or end point.

    Next, sort the list.

    Now, you can go through the list, maintaining a running sum of the second elements of the pairs. When you see a pair with second element 0, record the running sum (negated) for that integer.

    This runs in O((N+M)log(N+M)) time in the worst case (and in practice I guess it'll be linear if the queries and intervals are mostly sorted, thanks to timsort).

    For example:

    Input intervals: [(1, 3), (5, 6), (6, 9)]
    Input integers: [2, 4, 6, 8]
    
    Unified list (sorted):
    [(1,-1), (2,0), (3,1), (4,0), (5,-1), (6, -1), (6,0), (6,1), (8,0), (9,1)]
    
    Running sum:
    [-1    , -1,    0,     0,      -1,    -2,      0,      -1,    -1,   0]
    
    Values for integers:
    2: 1, 4: 0, 6: 2, 8, 1
    

    Example code:

    def query(qs, intervals):
        xs = [(q, 0) for q in qs] + [(x, -1) for x, _ in intervals] + [(x, 1) for _, x in intervals]
        S, r = 0, dict()
        for v, s in sorted(xs):
            if s == 0:
                r[v] = S
            S -= s
        return r
    
    intervals = [(3, 3), (22, 30), (17, 29), (7, 12), (12, 34), (18, 38), (30, 40), (5, 27), (19, 26), (27, 27), (1, 31), (17, 17), (22, 25), (6, 14), (5, 7), (9, 19), (24, 28), (19, 40), (9, 36), (2, 32)]
    queries = [16, 18, 39, 40, 27, 28, 4, 23, 15, 24, 2, 6, 32, 17, 21, 29, 31, 7, 20, 10]
    print(query(queries, intervals))
    

    Output:

    {2: 2, 4: 2, 6: 5, 7: 6, 10: 7, 15: 6, 16: 6, 17: 8, 18: 8, 20: 9, 21: 9, 23: 11, 24: 12, 27: 11, 28: 9, 29: 8, 31: 7, 32: 6, 39: 2, 40: 2}
    
    0 讨论(0)
  • 2021-01-05 12:42

    You can presort the integers and then use bisect_left on the lower bound. Sorting has O(M*log(M)) complexity while bisect has O(log(M)). So effectively you have O(max(M, N) * log(M)).

    import bisect
    from collections import defaultdict
    
    result = defaultdict(int)
    integers = sorted(integers)
    for low, high in intervals:
        index = bisect.bisect_left(integers, low)
        while index < len(integers) and integers[index] <= high:
            result[integers[index]] += 1
            index += 1
    
    0 讨论(0)
  • 2021-01-05 12:45

    Depending on the use case and context, something simple may be adequate:

    from collections import Counter
    from itertools import chain
    
    counts = Counter(chain.from_iterable(range(f, t+1) for f,t in input_intervals))
    result = {k:counts[k] for k in input_numbers}
    

    O(n*k + m) where n is the number of intervals, k is the average size of an interval, and m is the number of integers.

    0 讨论(0)
提交回复
热议问题