Interleave different length lists, elimating duplicates, and preserve order

后端 未结 6 1821
一整个雨季
一整个雨季 2021-02-11 22:12

I have two lists, let\'s say:

keys1 = [\'A\', \'B\', \'C\', \'D\', \'E\',           \'H\', \'I\']
keys2 = [\'A\', \'B\',           \'E\', \'F\', \'G\', \'H\',            


        
6条回答
  •  悲&欢浪女
    2021-02-11 23:02

    I recently had stumbled upon a similar issue while implementing a feature. I tried to clearly define the problem statement first. If I understand right, here is the problem statement

    Problem Statement

    Write a function merge_lists which will merge a list of lists with overlapping items, while preserving the order of items.

    Constraints

    1. If item A comes before item B in all the lists where they occur together, then item A must precede item B in the final list also

    2. If item A and item B interchange order in different lists, ie in some lists A precedes B and in some others B precedes A, then the order of A and B in the final list should be the same as their order in the first list where they occur together. That is, if A precedes B in l1 and B precedes A in l2, then A should precede B in final list

    3. If Item A and Item B do not occur together in any list, then their order must be decided by the position of the list in which each one occurs first. That is, if item A is in l1 and l3, item B is in l2 and l6, then the order in the final list must be A then B

    Test case 1:

    Input:

    l1 = ["Type and Size", "Orientation", "Material", "Locations", "Front Print Type", "Back Print Type"]

    l2 = ["Type and Size", "Material", "Locations", "Front Print Type", "Front Print Size", "Back Print Type", "Back Print Size"]

    l3 = ["Orientation", "Material", "Locations", "Color", "Front Print Type"]

    merge_lists([l1,l2,l3])

    Output:

    ['Type and Size', 'Orientation', 'Material', 'Locations', 'Color', 'Front Print Type', 'Front Print Size', 'Back Print Type', 'Back Print Size']

    Test case 2:

    Input:

    l1 = ["T", "V", "U", "B", "C", "I", "N"]

    l2 = ["Y", "V", "U", "G", "B", "I"]

    l3 = ["X", "T", "V", "M", "B", "C", "I"]

    l4 = ["U", "P", "G"]

    merge_lists([l1,l2,l3, l4])

    Output:

    ['Y', 'X', 'T', 'V', 'U', 'M', 'P', 'G', 'B', 'C', 'I', 'N']

    Test case 3:

    Input:

    l1 = ["T", "V", "U", "B", "C", "I", "N"]

    l2 = ["Y", "U", "V", "G", "B", "I"]

    l3 = ["X", "T", "V", "M", "I", "C", "B"]

    l4 = ["U", "P", "G"]

    merge_lists([l1,l2,l3, l4])

    Output:

    ['Y', 'X', 'T', 'V', 'U', 'M', 'P', 'G', 'B', 'C', 'I', 'N']

    Solution

    I arrived at a reasonable solution which solved it correctly for all the data I had. (It might be wrong for some other data set. Will leave it for others to comment that). Here is the solution

    def remove_duplicates(l):
        return list(set(l))
    
    def flatten(list_of_lists):
        return [item for sublist in list_of_lists for item in sublist]
    
    def difference(list1, list2):
        result = []
        for item in list1:
            if item not in list2:
                result.append(item)
        return result
    
    def preceding_items_list(l, item):
        if item not in l:
            return []
        return l[:l.index(item)]
    
    def merge_lists(list_of_lists):
        final_list = []
        item_predecessors = {}
    
        unique_items = remove_duplicates(flatten(list_of_lists))
        item_priorities = {}
    
        for item in unique_items:
            preceding_items = remove_duplicates(flatten([preceding_items_list(l, item) for l in list_of_lists]))
            for p_item in preceding_items:
                if p_item in item_predecessors and item in item_predecessors[p_item]:
                    preceding_items.remove(p_item)
            item_predecessors[item] = preceding_items
        print "Item predecessors ", item_predecessors
    
        items_to_be_checked = difference(unique_items, item_priorities.keys())
        loop_ctr = -1
        while len(items_to_be_checked) > 0:
            loop_ctr += 1
            print "Starting loop {0}".format(loop_ctr)
            print "items to be checked ", items_to_be_checked
            for item in items_to_be_checked:
                predecessors = item_predecessors[item]
                if len(predecessors) == 0:
                    item_priorities[item] = 0
                else:
                    if all(pred in item_priorities for pred in predecessors):
                        item_priorities[item] = max([item_priorities[p] for p in predecessors]) + 1
            print "item_priorities at end of loop ", item_priorities
            items_to_be_checked = difference(unique_items, item_priorities.keys())
            print "items to be checked at end of loop ", items_to_be_checked
            print
    
        final_list = sorted(unique_items, key=lambda item: item_priorities[item])
        return final_list
    

    I've also open sourced the code as a part of the library named toolspy. So you can just do this

    pip install toolspy
    
    from toolspy import merge_lists
    lls=[['a', 'x', 'g'], ['x', 'v', 'g'], ['b', 'a', 'c', 'x']]
    merge_lists(lls)
    

提交回复
热议问题