itertools.groupby() not grouping correctly

后端 未结 3 1300
北恋
北恋 2020-11-28 13:53

I have this data:

self.data = [(1, 1, 5.0),
             (1, 2, 3.0),
             (1, 3, 4.0),
             (2, 1, 4.0),
             (2, 2, 2.0)]


        
相关标签:
3条回答
  • 2020-11-28 14:16

    Variant without sorting (via dictionary). Should be better performance-wise.

    def full_group_by(l, key=lambda x: x):
        d = defaultdict(list)
        for item in l:
            d[key(item)].append(item)
        return d.items()
    
    0 讨论(0)
  • 2020-11-28 14:20

    itertools.groupby collects together contiguous items with the same key. If you want all items with the same key, you have to sort self.data first.

    for mid, group in itertools.groupby(
        sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
    
    0 讨论(0)
  • 2020-11-28 14:21

    Below "fixes" several annoyances with Python's itertools.groupby.

    def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
        if sort:
            l = sorted(l, key=key)
        return ((k, agg((val(x) for x in v))) \
            for k,v in itertools.groupby(l, key=key))
    

    Specifically,

    1. It doesn't require that you sort your data.
    2. It doesn't require that you must use key as named parameter only.
    3. The output is clean generator of tuple(key, grouped_values) where values are specified by 3rd parameter.
    4. Ability to apply aggregation functions like sum or avg easily.

    Example Usage

    import itertools
    from operator import itemgetter
    from statistics import *
    
    t = [('a',1), ('b',2), ('a',3)]
    for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
      print(k, v)
    

    This prints,

    a 4
    b 2
    

    Play with this code

    0 讨论(0)
提交回复
热议问题