Alright, so I have two lists, as such:
[1, 2, 3, 4, 5]
, [4, 5, 6, 7]
.
This actually isn't too terribly difficult. After all, essentially all you're doing is checking what substring at the end of A lines up with what substring of B.
def merge(a, b):
max_offset = len(b) # can't overlap with greater size than len(b)
for i in reversed(range(max_offset+1)):
# checks for equivalence of decreasing sized slices
if a[-i:] == b[:i]:
break
return a + b[i:]
We can test with your test data by doing:
test_data = [{'a': [1,3,9,8,3,4,5], 'b': [3,4,5,7,8], 'result': [1,3,9,8,3,4,5,7,8]},
{'a': [9, 1, 1, 8, 7], 'b': [8, 6, 7], 'result': [9, 1, 1, 8, 7, 8, 6, 7]}]
all(merge(test['a'], test['b']) == test['result'] for test in test_data)
This runs through every possible combination of slices that could result in an overlap and remembers the result of the overlap if one is found. If nothing is found, it uses the last result of i
which will always be 0
. Either way, it returns all of a
plus everything past b[i]
(in the overlap case, that's the non overlapping portion. In the non-overlap case, it's everything)
Note that we can make a couple optimizations in corner cases. For instance, the worst case here is that it runs through the whole list without finding any solution. You could add a quick check at the beginning that might short circuit that worst case
def merge(a, b):
if a[-1] not in b:
return a + b
...
In fact you could take that solution one step further and probably make your algorithm much faster
def merge(a, b):
while True:
try:
idx = b.index(a[-1]) + 1 # leftmost occurrence of a[-1] in b
except ValueError: # a[-1] not in b
return a + b
if a[-idx:] == b[:idx]:
return a + b[:idx]
However this might not find the longest overlap in cases like:
a = [1,2,3,4,1,2,3,4]
b = [3,4,1,2,3,4,5,6]
# result should be [1,2,3,4,1,2,3,4,5,6], but
# this algo produces [1,2,3,4,1,2,3,4,1,2,3,4,5,6]
You could fix that be using rindex
instead of index
to match the longest slice instead of the shortest, but I'm not sure what that does to your speed. It's certainly slower, but it might be inconsequential. You could also memoize the results and return the shortest result, which might be a better idea.
def merge(a, b):
results = []
while True:
try:
idx = b.index(a[-1]) + 1 # leftmost occurrence of a[-1] in b
except ValueError: # a[-1] not in b
results.append(a + b)
break
if a[-idx:] == b[:idx]:
results.append(a + b[:idx])
return min(results, key=len)
Which should work since merging the longest overlap should produce the shortest result in all cases.