Python find continuous interesctions of intervals

问题

I tired multiple approaches, but failed to do this one job. All of them use only 2 lists or range of lists. The one most promising was:

infile = open('file','r')

for line in infile:
    line = line.split()
    f = range(int(line[0]),int(line[1]))

results_union = set().union(*f)
print results_union

I have a file with start,end positions like this: (sorted)

I would like the output to be:

1 6
9 17

回答1:

Try following:

def group(data):
    data = sorted(data)
    it = iter(data)
    a, b = next(it)
    for c, d in it:
        if b >= c:  # Use `if b > c` if you want (1,2), (2,3) not to be
                    # treated as intersection.
            b = max(b, d)
        else:
            yield a, b
            a, b = c, d
    yield a, b


with open('file') as f:
    data = [map(int, line.split()) for line in f]

for a, b in group(data):
    print a, b

Example:

>>> data = (9,16), (1,5), (1,3), (1,2), (3,6), (9,11), (12,17), (2,4),
>>> list(group(data))
[(1, 6), (9, 17)]

回答2:

This following looks promising. The first part is based on your approach. The second part just looks for contiguous intervals in the union of ranges.

intervals = []
with open('contigous_input.txt', 'r') as infile:
    for line in infile:
        start, stop = sorted(map(int, line.split()))
        intervals.append(range(start, stop+1))

union = list(set().union(*intervals))
print union

results = []
i = start = 0
j = i + 1
while j < len(union):
    if union[j] != union[i]+1:
        results.append( (union[start], union[j-1]) )
        if j == len(union):
            break
        i = start = j
        j = i + 1
    else:
        i, j = j, j + 1

if start != j-1:
    results.append( (union[start], union[j-1]) )

print results

Output:

[1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17]
[(1, 6), (9, 17)]

回答3:

You should use 2-tuples instead of the range function, because range returns a list. Here's a simple function that will combine two of your 2-tuples if possible:

def combine_bounds(x, y):
  a, b = sorted([x, y])
  if a[1]+1 >= b[0]:
    return (a[0], max(a[1],b[1]))

Sample output:

>>> combine_bounds((1,2), (3,4))
(1, 4)
>>> combine_bounds((1,100), (3,4))
(1, 100)
>>> combine_bounds((1,2), (4,10))
>>> combine_bounds((1,3), (4,10))
(1, 10)
>>> combine_bounds((1,6), (4,10))
(1, 10)
>>> combine_bounds((10,600), (4,10))
(4, 600)
>>> combine_bounds((11,600), (4,10))
(4, 600)
>>> combine_bounds((9,600), (4,10))
(4, 600)
>>> combine_bounds((1,600), (4,10))
(1, 600)
>>> combine_bounds((12,600), (4,10))
>>> combine_bounds((12,600), (4,10)) is None
True

None is a falsey value in Python, so you can use the result of combine_bounds in conditionals. If it returns None (similar to False), then there was no intersection. If it returns a 2-tuple, then there was an intersection and the return value is the result.

I didn't do all the work for you (you still need to figure out how to use this on the input to get your desired output), but this should get you going in the right direction!

来源：https://stackoverflow.com/questions/18745693/python-find-continuous-interesctions-of-intervals

标签

python

intersection