conditional sums for pandas aggregate

前端 未结 2 1233
谎友^
谎友^ 2020-12-24 13:28

I just recently made the switch from R to python and have been having some trouble getting used to data frames again as opposed to using R\'s data.table. The problem I\'ve b

相关标签:
2条回答
  • 2020-12-24 14:00

    There might be a better way; I'm pretty new to pandas, but this works:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'A_id':'a1 a2 a3 a3 a4 a5'.split(),
                       'B': 'up down up up left right'.split(),
                       'C': [100, 102, 100, 250, 100, 102]})
    
    df['D'] = (df['B']=='up') & (df['C'] > 200)
    grouped = df.groupby(['A_id'])
    
    def sum_up(grp):
        return np.sum(grp=='up')
    def sum_down(grp):
        return np.sum(grp=='down')
    def over_200_up(grp):
        return np.sum(grp)
    
    result = grouped.agg({'B': [sum_up, sum_down],
                          'D': [over_200_up]})
    result.columns = [col[1] for col in result.columns]
    print(result)
    

    yields

          sum_up  sum_down  over_200_up
    A_id                               
    a1         1         0            0
    a2         0         1            0
    a3         2         0            1
    a4         0         0            0
    a5         0         0            0
    
    0 讨论(0)
  • 2020-12-24 14:19

    To complement unutbu's answer, here's an approach using apply on the groupby object.

    >>> df.groupby('A_id').apply(lambda x: pd.Series(dict(
        sum_up=(x.B == 'up').sum(),
        sum_down=(x.B == 'down').sum(),
        over_200_up=((x.B == 'up') & (x.C > 200)).sum()
    )))
          over_200_up  sum_down  sum_up
    A_id                               
    a1              0         0       1
    a2              0         1       0
    a3              1         0       2
    a4              0         0       0
    a5              0         0       0
    
    0 讨论(0)
提交回复
热议问题