Pythonic way to merge two overlapping lists, preserving order

后端未结

关注

 8  1617

鱼传尺愫 2021-02-03 22:40

Alright, so I have two lists, as such:

They can and will have overlapping items, for example, [1, 2, 3, 4, 5], [4, 5, 6, 7].
The

8条回答

再見小時候 (楼主)

2021-02-03 22:56
This actually isn't too terribly difficult. After all, essentially all you're doing is checking what substring at the end of A lines up with what substring of B.
```
def merge(a, b):
    max_offset = len(b)  # can't overlap with greater size than len(b)
    for i in reversed(range(max_offset+1)):
        # checks for equivalence of decreasing sized slices
        if a[-i:] == b[:i]:
            break
    return a + b[i:]
```
We can test with your test data by doing:
```
test_data = [{'a': [1,3,9,8,3,4,5], 'b': [3,4,5,7,8], 'result': [1,3,9,8,3,4,5,7,8]},
             {'a': [9, 1, 1, 8, 7], 'b': [8, 6, 7], 'result': [9, 1, 1, 8, 7, 8, 6, 7]}]

all(merge(test['a'], test['b']) == test['result'] for test in test_data)
```
This runs through every possible combination of slices that could result in an overlap and remembers the result of the overlap if one is found. If nothing is found, it uses the last result of i which will always be 0. Either way, it returns all of a plus everything past b[i] (in the overlap case, that's the non overlapping portion. In the non-overlap case, it's everything)

Note that we can make a couple optimizations in corner cases. For instance, the worst case here is that it runs through the whole list without finding any solution. You could add a quick check at the beginning that might short circuit that worst case
```
def merge(a, b):
    if a[-1] not in b:
        return a + b
    ...
```
In fact you could take that solution one step further and probably make your algorithm much faster
```
def merge(a, b):
    while True:
        try:
            idx = b.index(a[-1]) + 1  # leftmost occurrence of a[-1] in b
        except ValueError:  # a[-1] not in b
            return a + b
        if a[-idx:] == b[:idx]:
            return a + b[:idx]
```
However this might not find the longest overlap in cases like:
```
a = [1,2,3,4,1,2,3,4]
b = [3,4,1,2,3,4,5,6]
# result should be [1,2,3,4,1,2,3,4,5,6], but
# this algo produces [1,2,3,4,1,2,3,4,1,2,3,4,5,6]
```
You could fix that be using rindex instead of index to match the longest slice instead of the shortest, but I'm not sure what that does to your speed. It's certainly slower, but it might be inconsequential. You could also memoize the results and return the shortest result, which might be a better idea.
```
def merge(a, b):
    results = []
    while True:
        try:
            idx = b.index(a[-1]) + 1  # leftmost occurrence of a[-1] in b
        except ValueError:  # a[-1] not in b
            results.append(a + b)
            break
        if a[-idx:] == b[:idx]:
            results.append(a + b[:idx])
    return min(results, key=len)
```
Which should work since merging the longest overlap should produce the shortest result in all cases.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...