pandas, apply multiple functions of multiple columns to groupby object

前端 未结 6 1346
迷失自我
迷失自我 2021-02-13 12:55

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

6条回答
  •  情深已故
    2021-02-13 13:23

    Here is a solution which closely resembles the original idea expressed under "I suspect there is a better way".

    I'll use the same testing data as the other answers:

    lasts = pd.DataFrame({'user':['a','s','d','d'],
                          'elapsed_time':[40000,50000,60000,90000],
                          'running_time':[30000,20000,30000,15000],
                          'num_cores':[7,8,9,4]})
    

    groupby.apply can accept a function which returns a dataframe and will then automatically stitch the returned dataframes together. There are two small catches in the wording below. The first is noticing that the values passed to DataFrame are in fact single-element lists instead of just numbers.

    def aggfunc(group):
        """ This function mirrors the OP's idea. Note the values below are lists """
        return pd.DataFrame({'elapsed_days': [(group.elapsed_time * group.num_cores).sum() / 86400], 
                             'running_days': [(group.running_time * group.num_cores).sum() / 86400]})
    
    user_df = lasts.groupby('user').apply(aggfunc)
    

    Result:

            elapsed_days  running_days
    user                              
    a    0      3.240741      2.430556
    d    0     10.416667      3.819444
    s    0      4.629630      1.851852
    

    The second is that the returned dataframe has a hierarchical index (that column of zeros), which can be flattened as shown below:

    user_df.index = user_df.index.levels[0]
    

    Result:

          elapsed_days  running_days
    user                            
    a         3.240741      2.430556
    d        10.416667      3.819444
    s         4.629630      1.851852
    

提交回复
热议问题