Collapse a list of range tuples into the overlapping ranges

≯℡__Kan透↙ 提交于 2021-02-07 03:35:33

问题


I'm looking for the most memory efficient way to solve this problem.

I have a list of tuples representing partial string matches in a sentence:

[(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]

The first value of each tuple is the start position for the match, the second value is the length.

The idea is to collapse the list so that only the longest continue string match is reported. In this case it would be:

[(0,4), (2,6), (22,6)]

I do not want just the longest range, like in algorithm to find longest non-overlapping sequences, but I want all the ranges collapsed by the longest.

In case your wondering, I am using a pure python implementation of the Aho-Corasick for matching terms in a static dictionary to the given text snippet.

EDIT: Due to the nature of these tuple lists, overlapping but not self-contained ranges should be printed out individually. For example, having the words betaz and zeta in the dictionary, the matches for betazeta are [(0,5),(4,8)]. Since these ranges overlap, but none is contained in the other, the answer should be [(0,5),(4,8)]. I have also modified the input dataset above so that this case is covered.

Thanks!


回答1:


import operator
lst = [(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort(key=operator.itemgetter(1))
for i in reversed(xrange(len(lst)-1)):
    start, length = lst[i]
    for j in xrange(i+1, len(lst)):
        lstart, llength = lst[j]
        if start >= lstart and start + length <= lstart + llength:
            del lst[i]
            break
print lst
#[(0, 4), (2, 6), (22, 6)]



回答2:


a = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
b = [set(xrange(i, i + j)) for i, j in a]
c = b.pop().union(*b)
collapsed = sorted(c)
print collapsed
#Maybe this is useful?:
[0, 1, 2, 3, 22, 23, 24, 25, 26, 27]

#But if you want the requested format, then do this:
d = []
start = collapsed[0]
length = 0
for val in collapsed:
    if start + length < val:
        d.append((start,length))
        start = val
        length = 0
    elif val == collapsed[-1]:
        d.append((start,length + 1))
    length += 1
print d
#Output:
[(0,4), (22,6)]



回答3:


So, taking you at your word that your main interest is space efficiency, here's one way to do what you want:

lst = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort()
start, length = lst.pop(0)
i = 0
while i < len(lst):
    x, l = lst[i]
    if start + length < x:
        lst[i] = (start, length)
        i += 1
        start, length = x, l
    else:
        length = max(length, x + l - start)
        lst.pop(i)
lst.append((start, length))

This modifies the list in place, never makes the list longer, only uses a small handful of variables to keep state, and only needs one pass through the list

A much faster algorithm is possible if you don't want to modify the list in place - popping items from the middle of a list can be slow, especially if the list is long.

One reasonable optimization would be to keep a list of which indices you're going to remove, and then come back and rebuild the list in a second pass, that way you could rebuild the whole list in one go and avoid the pop overhead. But that would use more memory!



来源:https://stackoverflow.com/questions/10790415/collapse-a-list-of-range-tuples-into-the-overlapping-ranges

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!