How to group tuples by common items and find average per each group

前端 未结 3 1864
我在风中等你
我在风中等你 2021-01-15 04:44

I have a list of tuples named data:

data = [(\'A\', 2), 
        (\'B\', 2), (\'B\', 4), (\'B\', 6), (\'B\', 8), (\'B\', 6), (\'B\', 4), (\'B\',         


        
相关标签:
3条回答
  • 2021-01-15 05:25

    Here is an option with a defaultdict:

    from collections import defaultdict
    avg = defaultdict(lambda :{'count': 0, 'sum': 0})
    ​
    # calculate the sum and count for each key
    for k, v in data:
        avg[k]['count'] += 1
        avg[k]['sum'] += v
    
    # calculate the average
    [(k, v['sum']/v['count']) for k, v in avg.items()]
    
    #[('A', 2.0),
    # ('D', 12.0),
    # ('F', 8.0),
    # ('E', 12.0),
    # ('B', 4.714285714285714),
    # ('C', 10.0)]
    
    0 讨论(0)
  • 2021-01-15 05:30

    An alternative solution which you might consider, especially when dealing with large data sets, is to use pandas. Here, groupby and mean will do the job:

    import pandas as pd
    
    data = [('A', 2), 
            ('B', 2), ('B', 4), ('B', 6), ('B', 8), ('B', 6), ('B', 4), ('B', 3),
            ('C', 10), ('C', 10), ('C', 10),
            ('D', 12),
            ('E', 12),
            ('F', 10), ('F', 8), ('F', 6)]
    
    df = pd.DataFrame(data, columns=['letter', 'number'])
    print(df)
    #    letter  number
    # 0       A       2
    # 1       B       2
    # 2       B       4
    # 3       B       6
    # 4       B       8
    # 5       B       6
    # 6       B       4
    # 7       B       3
    # 8       C      10
    # 9       C      10
    # 10      C      10
    # 11      D      12
    # 12      E      12
    # 13      F      10
    # 14      F       8
    # 15      F       6
    
    print(df.groupby('letter').mean())
    #            number
    # letter           
    # A        2.000000
    # B        4.714286
    # C       10.000000
    # D       12.000000
    # E       12.000000
    # F        8.000000
    
    print(df.groupby('letter').mean().round().astype(int))
    #         number
    # letter        
    # A            2
    # B            5
    # C           10
    # D           12
    # E           12
    # F            8
    

    You can get back your list of tuples as follows:

    averages = df.groupby('letter').mean().round().astype(int)
    result = list(result.to_records())
    print(result)
    # [('A', 2), ('B', 5), ('C', 10), ('D', 12), ('E', 12), ('F', 8)]
    
    0 讨论(0)
  • 2021-01-15 05:41

    Try with groupby

    from itertools import groupby
    data_ = [(n,[i[1] for i in g]) for n,g in groupby(data, key = lambda x:x[0])]   
    result = [(i,float(sum(j))/float(len(j))) for i,j in data_]
    

    Result

    [('A', 2.0),
     ('B', 4.714285714285714),
     ('C', 10.0),
     ('D', 12.0),
     ('E', 12.0),
     ('F', 8.0)]
    
    0 讨论(0)
提交回复
热议问题