I have this data:
self.data = [(1, 1, 5.0),
(1, 2, 3.0),
(1, 3, 4.0),
(2, 1, 4.0),
(2, 2, 2.0)]
Variant without sorting (via dictionary). Should be better performance-wise.
def full_group_by(l, key=lambda x: x):
d = defaultdict(list)
for item in l:
d[key(item)].append(item)
return d.items()
itertools.groupby collects together contiguous items with the same key.
If you want all items with the same key, you have to sort self.data
first.
for mid, group in itertools.groupby(
sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
Below "fixes" several annoyances with Python's itertools.groupby
.
def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
if sort:
l = sorted(l, key=key)
return ((k, agg((val(x) for x in v))) \
for k,v in itertools.groupby(l, key=key))
Specifically,
key
as named parameter only.tuple(key, grouped_values)
where values are specified by 3rd parameter.Example Usage
import itertools
from operator import itemgetter
from statistics import *
t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
print(k, v)
This prints,
a 4
b 2
Play with this code