Alright, so I have two lists, as such:
[1, 2, 3, 4, 5]
, [4, 5, 6, 7]
.
One trivial optimization is not iterating over the whole master
list. I.e., replace while n < len(master)
with for n in range(min(len(addition), len(master)))
(and don't increment n
in the loop). If there is no match, your current code will iterate over the entire master
list, even if the slices being compared aren't even of the same length.
Another concern is that you're taking slices of master
and addition
in order to compare them, which creates two new lists every time, and isn't really necessary. This solution (inspired by Boyer-Moore) doesn't use slicing:
def merge(master, addition):
overlap_lens = (i + 1 for i, e in enumerate(addition) if e == master[-1])
for overlap_len in overlap_lens:
for i in range(overlap_len):
if master[-overlap_len + i] != addition[i]:
break
else:
return master + addition[overlap_len:]
return master + addition
The idea here is to generate all the indices of the last element of master
in addition
, and add 1
to each. Since a valid overlap must end with the last element of master
, only those values are lengths of possible overlaps. Then we can check for each of them if the elements before it also line up.
The function currently assumes that master
is longer than addition
(you'll probably get an IndexError
at master[-overlap_len + i]
if it isn't). Add a condition to the overlap_lens
generator if you can't guarantee it.
It's also non-greedy, i.e. it looks for the smallest non-empty overlap (merge([1, 2, 2], [2, 2, 3])
will return [1, 2, 2, 2, 3]
). I think that's what you meant by "to merge at the last possible valid position". If you want a greedy version, reverse the overlap_lens
generator.