Pythonic way to merge two overlapping lists, preserving order

后端 未结 8 1622
鱼传尺愫
鱼传尺愫 2021-02-03 22:40

Alright, so I have two lists, as such:

  • They can and will have overlapping items, for example, [1, 2, 3, 4, 5], [4, 5, 6, 7].
  • The
8条回答
  •  爱一瞬间的悲伤
    2021-02-03 23:06

    One trivial optimization is not iterating over the whole master list. I.e., replace while n < len(master) with for n in range(min(len(addition), len(master))) (and don't increment n in the loop). If there is no match, your current code will iterate over the entire master list, even if the slices being compared aren't even of the same length.

    Another concern is that you're taking slices of master and addition in order to compare them, which creates two new lists every time, and isn't really necessary. This solution (inspired by Boyer-Moore) doesn't use slicing:

    def merge(master, addition):
        overlap_lens = (i + 1 for i, e in enumerate(addition) if e == master[-1])
        for overlap_len in overlap_lens:
            for i in range(overlap_len):
                if master[-overlap_len + i] != addition[i]:
                    break
            else:
                return master + addition[overlap_len:]
        return master + addition
    

    The idea here is to generate all the indices of the last element of master in addition, and add 1 to each. Since a valid overlap must end with the last element of master, only those values are lengths of possible overlaps. Then we can check for each of them if the elements before it also line up.

    The function currently assumes that master is longer than addition (you'll probably get an IndexError at master[-overlap_len + i] if it isn't). Add a condition to the overlap_lens generator if you can't guarantee it.

    It's also non-greedy, i.e. it looks for the smallest non-empty overlap (merge([1, 2, 2], [2, 2, 3]) will return [1, 2, 2, 2, 3]). I think that's what you meant by "to merge at the last possible valid position". If you want a greedy version, reverse the overlap_lens generator.

提交回复
热议问题