I have two lists, let\'s say:
keys1 = [\'A\', \'B\', \'C\', \'D\', \'E\', \'H\', \'I\']
keys2 = [\'A\', \'B\', \'E\', \'F\', \'G\', \'H\',
I suspect that you may be asking for a solution to the shortest common supersequence problem, which I believe is NP-hard in the general case of an arbitrary number of input sequences. I'm not aware of any libraries for solving this problem, so you might have to implement one by hand. Probably the quickest way to get to working code would be to take interjay's answer using difflib and then use reduce
to run it on an arbitrary number of lists (make sure to specify the empty list as the 3rd argument to reduce
).
What you need is basically what any merge utility does: It tries to merge two sequences, while keeping the relative order of each sequence. You can use Python's difflib module to diff the two sequences, and merge them:
from difflib import SequenceMatcher
def merge_sequences(seq1,seq2):
sm=SequenceMatcher(a=seq1,b=seq2)
res = []
for (op, start1, end1, start2, end2) in sm.get_opcodes():
if op == 'equal' or op=='delete':
#This range appears in both sequences, or only in the first one.
res += seq1[start1:end1]
elif op == 'insert':
#This range appears in only the second sequence.
res += seq2[start2:end2]
elif op == 'replace':
#There are different ranges in each sequence - add both.
res += seq1[start1:end1]
res += seq2[start2:end2]
return res
Example:
>>> keys1 = ['A', 'B', 'C', 'D', 'E', 'H', 'I']
>>> keys2 = ['A', 'B', 'E', 'F', 'G', 'H', 'J', 'K']
>>> merge_sequences(keys1, keys2)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']
Note that the answer you expect is not necessarily the only possible one. For example, if we change the order of sequences here, we get another answer which is just as valid:
>>> merge_sequences(keys2, keys1)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'I']
Here's a C# solution I came up with -- using an extension method -- for the case where the two lists might not contain the same type of elements, so it takes a compare method and a selector method (that returns an object of the target type given the source object). In this case, the first list ("me") is modified to contain the final result, but it could be modified to create a separate list.
public static class ListExtensions
{
/// <summary>
/// Merges two sorted lists containing potentially different types of objects, resulting in a single
/// sorted list of objects of type T with no duplicates.
/// </summary>
public static void MergeOrderedList<TMe, TOther>(this List<TMe> me, IReadOnlyList<TOther> other, Func<TMe, TOther, int> compare = null, Func<TOther, TMe> selectT = null)
{
if (other == null)
throw new ArgumentNullException(nameof(other));
if (compare == null)
{
if (typeof(TMe).GetInterfaces().Any(i => i == typeof(IComparable<TOther>)))
{
compare = (a, b) => ((IComparable<TOther>)a).CompareTo(b);
}
else
{
throw new ArgumentNullException(nameof(compare),
"A comparison method must be supplied if no default comparison exists.");
}
}
if (selectT == null)
if (typeof(TMe).IsAssignableFrom(typeof(TOther)))
{
selectT = o => (TMe)(o as object);
}
else
{
throw new ArgumentNullException(nameof(selectT),
$"A selection method must be supplied if the items in the other list cannot be assigned to the type of the items in \"{nameof(me)}\"");
}
if (me.Count == 0)
{
me.AddRange(other.Select(selectT));
return;
}
for (int o = 0, m = 0; o < other.Count; o++)
{
var currentOther = other[o];
while (compare(me[m], currentOther) < 0 && ++m < me.Count) {}
if (m == me.Count)
{
me.AddRange(other.Skip(o).Select(selectT));
break;
}
if (compare(me[m], currentOther) != 0)
me.Insert(m, selectT(currentOther));
}
}
}
Note: I did write unit tests for this, so it's solid.
I would use a Set (cf. python doc), that I'd fill with the elements of the two lists, one aafter the other.
And make a list from the Set when it's done.
Note that there is a contradiction/paradox in your question: you want to preserve order for elements that cannot be compared (only equality because "they are complex strings" as you said).
EDIT: the OP is right noticing that sets don't preserve order of insertion.
I recently had stumbled upon a similar issue while implementing a feature. I tried to clearly define the problem statement first. If I understand right, here is the problem statement
Write a function merge_lists which will merge a list of lists with overlapping items, while preserving the order of items.
If item A comes before item B in all the lists where they occur together, then item A must precede item B in the final list also
If item A and item B interchange order in different lists, ie in some lists A precedes B and in some others B precedes A, then the order of A and B in the final list should be the same as their order in the first list where they occur together. That is, if A precedes B in l1 and B precedes A in l2, then A should precede B in final list
If Item A and Item B do not occur together in any list, then their order must be decided by the position of the list in which each one occurs first. That is, if item A is in l1 and l3, item B is in l2 and l6, then the order in the final list must be A then B
l1 = ["Type and Size", "Orientation", "Material", "Locations", "Front Print Type", "Back Print Type"]
l2 = ["Type and Size", "Material", "Locations", "Front Print Type", "Front Print Size", "Back Print Type", "Back Print Size"]
l3 = ["Orientation", "Material", "Locations", "Color", "Front Print Type"]
merge_lists([l1,l2,l3])
['Type and Size', 'Orientation', 'Material', 'Locations', 'Color', 'Front Print Type', 'Front Print Size', 'Back Print Type', 'Back Print Size']
l1 = ["T", "V", "U", "B", "C", "I", "N"]
l2 = ["Y", "V", "U", "G", "B", "I"]
l3 = ["X", "T", "V", "M", "B", "C", "I"]
l4 = ["U", "P", "G"]
merge_lists([l1,l2,l3, l4])
['Y', 'X', 'T', 'V', 'U', 'M', 'P', 'G', 'B', 'C', 'I', 'N']
l1 = ["T", "V", "U", "B", "C", "I", "N"]
l2 = ["Y", "U", "V", "G", "B", "I"]
l3 = ["X", "T", "V", "M", "I", "C", "B"]
l4 = ["U", "P", "G"]
merge_lists([l1,l2,l3, l4])
['Y', 'X', 'T', 'V', 'U', 'M', 'P', 'G', 'B', 'C', 'I', 'N']
I arrived at a reasonable solution which solved it correctly for all the data I had. (It might be wrong for some other data set. Will leave it for others to comment that). Here is the solution
def remove_duplicates(l):
return list(set(l))
def flatten(list_of_lists):
return [item for sublist in list_of_lists for item in sublist]
def difference(list1, list2):
result = []
for item in list1:
if item not in list2:
result.append(item)
return result
def preceding_items_list(l, item):
if item not in l:
return []
return l[:l.index(item)]
def merge_lists(list_of_lists):
final_list = []
item_predecessors = {}
unique_items = remove_duplicates(flatten(list_of_lists))
item_priorities = {}
for item in unique_items:
preceding_items = remove_duplicates(flatten([preceding_items_list(l, item) for l in list_of_lists]))
for p_item in preceding_items:
if p_item in item_predecessors and item in item_predecessors[p_item]:
preceding_items.remove(p_item)
item_predecessors[item] = preceding_items
print "Item predecessors ", item_predecessors
items_to_be_checked = difference(unique_items, item_priorities.keys())
loop_ctr = -1
while len(items_to_be_checked) > 0:
loop_ctr += 1
print "Starting loop {0}".format(loop_ctr)
print "items to be checked ", items_to_be_checked
for item in items_to_be_checked:
predecessors = item_predecessors[item]
if len(predecessors) == 0:
item_priorities[item] = 0
else:
if all(pred in item_priorities for pred in predecessors):
item_priorities[item] = max([item_priorities[p] for p in predecessors]) + 1
print "item_priorities at end of loop ", item_priorities
items_to_be_checked = difference(unique_items, item_priorities.keys())
print "items to be checked at end of loop ", items_to_be_checked
print
final_list = sorted(unique_items, key=lambda item: item_priorities[item])
return final_list
I've also open sourced the code as a part of the library named toolspy. So you can just do this
pip install toolspy
from toolspy import merge_lists
lls=[['a', 'x', 'g'], ['x', 'v', 'g'], ['b', 'a', 'c', 'x']]
merge_lists(lls)
By using only lists, you can achieve this with few simple for
loops and .copy()
:
def mergeLists(list1, list2):
# Exit if list2 is empty
if not len(list2):
return list1
# Copy the content of list2 into merged list
merged = list2.copy()
# Create a list for storing temporary elements
elements = []
# Create a variable for storing previous element found in both lists
previous = None
# Loop through the elements of list1
for e in list1:
# Append the element to "elements" list if it's not in list2
if e not in merged:
elements.append(e)
# If it is in list2 (is a common element)
else:
# Loop through the stored elements
for x in elements:
# Insert all the stored elements after the previous common element
merged.insert(previous and merged.index(previous) + 1 or 0, x)
# Save new common element to previous
previous = e
# Empty temporary elements
del elements[:]
# If no more common elements were found but there are elements still stored
if len(elements)
# Insert them after the previous match
for e in elements:
merged.insert(previous and merged.index(previous) + 1 or 0, e)
# Return the merged list
return merged
In [1]: keys1 = ["A", "B", "D", "F", "G", "H"]
In [2]: keys2 = ["A", "C", "D", "E", "F", "H"]
In [3]: mergeLists(keys1, keys2)
Out[3]: ["A", "B", "C", "D", "E", "F", "G", "H"]
English is not my first language, and this one is pretty hard to explain, but if you care about the explanation, here's what it does:
elements
which can store temporary elements.previous
which stores the previous element that was in both lists.list2
but is in list1
, it will append that element to elements
list and continue the loop.elements
list, appending all elements after previous
element to list2
.previous
and elements
is reset to []
and the loop continues.This way it will always follow this format:
So for example:
l1 = ["A", "B", "C", "E"]
l2 = ["A", "D", "E"]
A
will be first in the merged list.l1
between the previous common element A
and the new common element E
will be inserted right after A
.l2
between the previous common elmeent A
and the new common elmeent E
will be inserted right after the elements from l1
.E
will be last element.Back to step 1 if more common elements found.
["A", "B", "C", "D", "E"]