Identifying lists that have 3 elements in common in a lists of lists

前端 未结 3 765
有刺的猬
有刺的猬 2021-01-16 09:00

I have a list of lists. If there are subslists that have the first three elements in common , merge them into one list and add all the fourth elements.

The problem i

相关标签:
3条回答
  • 2021-01-16 09:28

    I'd do something like this:

    >>> a_list = [['apple', 50, 60, 7],
    ...           ['orange', 70, 50, 8],
    ...           ['apple', 50, 60, 12]]
    >>> 
    >>> from collections import defaultdict
    >>> d = defaultdict(list)
    >>> from operator import itemgetter
    >>> getter = itemgetter(0,1,2)
    >>> for lst in a_list:
    ...     d[getter(lst)].extend(lst[3:])
    ... 
    >>> d
    defaultdict(<type 'list'>, {('apple', 50, 60): [7, 12], ('orange', 70, 50): [8]})
    >>> print [list(k)+v for k,v in d.items()]
    [['apple', 50, 60, 7, 12], ['orange', 70, 50, 8]]
    

    This doesn't give the sum however. It could be easily be fixed by doing:

    print [list(k)+[sum(v)] for k,v in d.items()]
    

    There isn't much of a reason to prefer this over the slightly more elegant solution by Martijn, other than it will allow the user to have an input list with more than 4 items (with the latter elements being summed as expected). In other words, this would pass the list:

    a_list = [['apple', 50, 60, 7, 12],
              ['orange', 70, 50, 8]]
    

    as well.

    0 讨论(0)
  • 2021-01-16 09:31

    Form the key from [:3] so that you get the first 3 elements.

    0 讨论(0)
  • 2021-01-16 09:35

    You can use the same principle, by using the first three elements as a key, and using int as the default value factory for the defaultdict (so you get 0 as the initial value):

    from collections import defaultdict
    
    a_list = [['apple', 50, 60, 7],
              ['orange', 70, 50, 8],
              ['apple', 50, 60, 12]]
    
    d = defaultdict(int)
    for sub_list in a_list:
        key = tuple(sub_list[:3])
        d[key] += sub_list[-1]
    
    new_data = [list(k) + [v] for k, v in d.iteritems()]
    

    If you are using Python 3, you can simplify this to:

    d = defaultdict(int)
    for *key, v in a_list:
        d[tuple(key)] += v
    
    new_data = [list(k) + [v] for k, v in d.items()]
    

    because you can use a starred target to take all 'remaining' values from a list, so each sublist is assigned mostly to key and the last value is assigned to v, making the loop just that little simpler (and there is no .iteritems() method on a dict in Python 3, because .items() is an iterator already).

    So, we use a defaultdict that uses 0 as the default value, then for each key generated from the first 3 values (as a tuple so you can use it as a dictionary key) sum the last value.

    • So for the first item ['apple', 50, 60, 7] we create a key ('apple', 50, 60), look that up in d (where it doesn't exist, but defaultdict will then use int() to create a new value of 0), and add the 7 from that first item.

    • Do the same for the ('orange', 70, 50) key and value 8.

    • for the 3rd item we get the ('apple', 50, 60) key again and add 12 to the pre-existing 7 in d[('apple', 50, 60)]. for a total of 19.

    Then we turn the (key, value) pairs back into lists and you are done. This results in:

    >>> new_data
    [['apple', 50, 60, 19], ['orange', 70, 50, 8]]
    

    An alternative implementation that requires sorting the data uses itertools.groupby:

    from itertools import groupby
    from operator import itemgetter
    
    a_list = [['apple', 50, 60, 7],
              ['orange', 70, 50, 8],
              ['apple', 50, 60, 12]]
    
    newlist = [list(key) + [sum(i[-1] for i in sublists)] 
        for key, sublists in groupby(sorted(a_list), key=itemgetter(0, 1, 2))]
    

    for the same output. This is going to be slower if your data isn't sorted, but it's good to know of different approaches.

    0 讨论(0)
提交回复
热议问题