Say I have a list of lists that has indexes [[start, end], [start1, end1], [start2, end2]]
.
Like for example :
[[0, 133], [78, 100], [25,
In general, two intervals are overlapping if:
min([upperBoundOfA, upperBoundOfB]) >= max([lowerBoundOfA, lowerBoundOfB])
If this is the case, the union of those intervals is:
(min([lowerBoundOfA, lowerBoundOfB]), max([upperBoundOfA, upperBoundOfB])
Similarly, the intersection of those intervals will be:
(min([upperBoundOfA, upperBoundOfB]), max([lowerBoundOfA, lowerBoundOfB]))
I think one problem in your code is that it does not handle with the situation where one list contains another. For example, [0,100]
contains [9,10]
. When you loop n in [0,100] and n enters [9,10], the condition statement if n in range(x[i+1][0], x[i+1][1])
is triggered. Then the builtin function max
will compare [0, 100]
and [9, 10]
, and unfortuantely max
will return [9,10]
because it compare the first number in the list. Thus you remove the wrong element.
I'm trying another way to achieve the effect you want. Rather than manipulating the list itself, I create a new list. Conditionally appending new element to it if it meets our requirements.
def cleanup_list(lists):
ranges = []
for l in lists:
to_insert = True
for i in ranges:
r = range(i[0],i[1])
# if l overlaps with i, but l does not contain i
if l[0] in r or l[1] in r:
if (l[1]-l[0]) < len(r):
ranges.remove(i)
else:
to_insert = False
# l contains i
if l[0]<i[0] and l[1]>i[1]:
to_insert = False
if to_insert:
ranges.append(l)
return ranges
This may not be the fastest solution, but really verbose and clear - I think.
a = [[2,100], [4,10], [77,99], [38,39], [44,80], [69,70], [88, 90]]
# build ranges first
def expand(list):
newList = []
for r in list:
newList.append(range(r[0], r[1] + 1))
return newList
def compare(list):
toBeDeleted = []
for index1 in range(len(list)):
for index2 in range(len(list)):
if index1 == index2:
# we dont want to compare ourselfs
continue
matches = [x for x in list[index1] if x in list[index2]]
if len(matches) != 0: # do we have overlap?
## compare lengths and get rid of the longer one
if len(list[index1]) > len(list[index2]):
toBeDeleted.append(index1)
break
elif len(list[index1]) < len(list[index2]):
toBeDeleted.append(index2)
# distinct
toBeDeleted = [ toBeDeleted[i] for i,x in enumerate(toBeDeleted) if x not in toBeDeleted[i+1:]]
print len(list)
# remove items
for i in toBeDeleted[::-1]:
del list[i]
return list
print(compare(expand(a)))
ascending sort all items by length.
add them to a segment tree and ignore overlapped items.
To remove a minimal number of intervals from the list such that the intervals that are left do not overlap, O(n*log n)
algorithm exists:
def maximize_nonoverlapping_count(intervals):
# sort by the end-point
L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
reverse=True) # O(n*logn)
iv = build_interval_tree(intervals) # O(n*log n)
result = []
while L: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(L.pop()) # O(1)
# remove intervals that overlap with the popped interval
overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
remove(overlapping_intervals, from_=L)
return result
It should produce the following results:
f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]
It requires the data structure that can find in O(log n + m)
time all intervals that overlap with the given interval e.g., IntervalTree. There are implementations that can be used from Python e.g., quicksect.py, see Fast interval intersection methodologies for the example usage.
Here's a quicksect
-based O(n**2)
implementation of the above algorithm:
from quicksect import IntervalNode
class Interval(object):
def __init__(self, start, end):
self.start = start
self.end = end
self.removed = False
def maximize_nonoverlapping_count(intervals):
intervals = [Interval(start, end) for start, end in intervals]
# sort by the end-point
intervals.sort(key=lambda x: (x.end, (x.end - x.start))) # O(n*log n)
tree = build_interval_tree(intervals) # O(n*log n)
result = []
for smallest in intervals: # O(n) (without the loop body)
# pop the interval with the smallest end-point, keep it in the result
if smallest.removed:
continue # skip removed nodes
smallest.removed = True
result.append([smallest.start, smallest.end]) # O(1)
# remove (mark) intervals that overlap with the popped interval
tree.intersect(smallest.start, smallest.end, # O(log n + m)
lambda x: setattr(x.other, 'removed', True))
return result
def build_interval_tree(intervals):
root = IntervalNode(intervals[0].start, intervals[0].end,
other=intervals[0])
return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
intervals[1:], root)
Note: the time complexity in the worst case is O(n**2)
for this implementation because the intervals are only marked as removed e.g., imagine such input intervals
that len(result) == len(intervals) / 3
and there were len(intervals) / 2
intervals that span the whole range then tree.intersect()
would be called n/3
times and each call would execute x.other.removed = True
at least n/2
times i.e., n*n/6
operations in total:
n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]
Here's a banyan-based O(n log n)
implementation:
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan
def maximize_nonoverlapping_count(intervals):
# sort by the end-point O(n log n)
sorted_intervals = SortedSet(intervals,
key=lambda (start, end): (end, (end - start)))
# build "interval" tree O(n log n)
tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
result = []
while sorted_intervals: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(sorted_intervals.pop()) # O(log n)
# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
sorted_intervals -= overlapping_intervals # O(m log n)
return result
Note: this implementation considers [0, 10]
and [10, 20]
intervals to be overlapping:
f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]
sorted_intervals
and tree
can be merged:
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan
def maximize_nonoverlapping_count(intervals):
# build "interval" tree sorted by the end-point O(n log n)
tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
updator=OverlappingIntervalsUpdator)
result = []
while tree: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(tree.pop()) # O(log n)
# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
return result