I have a list of lists. If there are subslists that have the first three elements in common , merge them into one list and add all the fourth elements.
The problem i
I'd do something like this:
>>> a_list = [['apple', 50, 60, 7],
... ['orange', 70, 50, 8],
... ['apple', 50, 60, 12]]
>>>
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> from operator import itemgetter
>>> getter = itemgetter(0,1,2)
>>> for lst in a_list:
... d[getter(lst)].extend(lst[3:])
...
>>> d
defaultdict(<type 'list'>, {('apple', 50, 60): [7, 12], ('orange', 70, 50): [8]})
>>> print [list(k)+v for k,v in d.items()]
[['apple', 50, 60, 7, 12], ['orange', 70, 50, 8]]
This doesn't give the sum however. It could be easily be fixed by doing:
print [list(k)+[sum(v)] for k,v in d.items()]
There isn't much of a reason to prefer this over the slightly more elegant solution by Martijn, other than it will allow the user to have an input list with more than 4 items (with the latter elements being summed as expected). In other words, this would pass the list:
a_list = [['apple', 50, 60, 7, 12],
['orange', 70, 50, 8]]
as well.
Form the key from [:3]
so that you get the first 3 elements.
You can use the same principle, by using the first three elements as a key, and using int
as the default value factory for the defaultdict
(so you get 0
as the initial value):
from collections import defaultdict
a_list = [['apple', 50, 60, 7],
['orange', 70, 50, 8],
['apple', 50, 60, 12]]
d = defaultdict(int)
for sub_list in a_list:
key = tuple(sub_list[:3])
d[key] += sub_list[-1]
new_data = [list(k) + [v] for k, v in d.iteritems()]
If you are using Python 3, you can simplify this to:
d = defaultdict(int)
for *key, v in a_list:
d[tuple(key)] += v
new_data = [list(k) + [v] for k, v in d.items()]
because you can use a starred target to take all 'remaining' values from a list, so each sublist is assigned mostly to key
and the last value is assigned to v
, making the loop just that little simpler (and there is no .iteritems()
method on a dict in Python 3, because .items()
is an iterator already).
So, we use a defaultdict
that uses 0
as the default value, then for each key generated from the first 3 values (as a tuple so you can use it as a dictionary key) sum the last value.
So for the first item ['apple', 50, 60, 7]
we create a key ('apple', 50, 60)
, look that up in d
(where it doesn't exist, but defaultdict
will then use int()
to create a new value of 0
), and add the 7
from that first item.
Do the same for the ('orange', 70, 50)
key and value 8
.
for the 3rd item we get the ('apple', 50, 60)
key again and add 12
to the pre-existing 7
in d[('apple', 50, 60)]
. for a total of 19.
Then we turn the (key, value) pairs back into lists and you are done. This results in:
>>> new_data
[['apple', 50, 60, 19], ['orange', 70, 50, 8]]
An alternative implementation that requires sorting the data uses itertools.groupby
:
from itertools import groupby
from operator import itemgetter
a_list = [['apple', 50, 60, 7],
['orange', 70, 50, 8],
['apple', 50, 60, 12]]
newlist = [list(key) + [sum(i[-1] for i in sublists)]
for key, sublists in groupby(sorted(a_list), key=itemgetter(0, 1, 2))]
for the same output. This is going to be slower if your data isn't sorted, but it's good to know of different approaches.