Apply multiple functions to multiple groupby columns

前端 未结 7 2090
春和景丽
春和景丽 2020-11-22 03:16

The docs show how to apply multiple functions on a groupby object at a time using a dict with the output column names as the keys:

In [563]: grouped[\'D\'].a         


        
7条回答
  •  后悔当初
    2020-11-22 04:18

    As an alternative (mostly on aesthetics) to Ted Petrou's answer, I found I preferred a slightly more compact listing. Please don't consider accepting it, it's just a much-more-detailed comment on Ted's answer, plus code/data. Python/pandas is not my first/best, but I found this to read well:

    df.groupby('group') \
      .apply(lambda x: pd.Series({
          'a_sum'       : x['a'].sum(),
          'a_max'       : x['a'].max(),
          'b_mean'      : x['b'].mean(),
          'c_d_prodsum' : (x['c'] * x['d']).sum()
      })
    )
    
              a_sum     a_max    b_mean  c_d_prodsum
    group                                           
    0      0.530559  0.374540  0.553354     0.488525
    1      1.433558  0.832443  0.460206     0.053313
    

    I find it more reminiscent of dplyr pipes and data.table chained commands. Not to say they're better, just more familiar to me. (I certainly recognize the power and, for many, the preference of using more formalized def functions for these types of operations. This is just an alternative, not necessarily better.)


    I generated data in the same manner as Ted, I'll add a seed for reproducibility.

    import numpy as np
    np.random.seed(42)
    df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
    df['group'] = [0, 0, 1, 1]
    df
    
              a         b         c         d  group
    0  0.374540  0.950714  0.731994  0.598658      0
    1  0.156019  0.155995  0.058084  0.866176      0
    2  0.601115  0.708073  0.020584  0.969910      1
    3  0.832443  0.212339  0.181825  0.183405      1
    

提交回复
热议问题