pandas, apply multiple functions of multiple columns to groupby object

前端未结

关注

 6  1346

迷失自我 2021-02-13 12:55

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

6条回答

情深已故 (楼主)

2021-02-13 13:23

Here is a solution which closely resembles the original idea expressed under "I suspect there is a better way".

I'll use the same testing data as the other answers:

lasts = pd.DataFrame({'user':['a','s','d','d'],
                      'elapsed_time':[40000,50000,60000,90000],
                      'running_time':[30000,20000,30000,15000],
                      'num_cores':[7,8,9,4]})

groupby.apply can accept a function which returns a dataframe and will then automatically stitch the returned dataframes together. There are two small catches in the wording below. The first is noticing that the values passed to DataFrame are in fact single-element lists instead of just numbers.

def aggfunc(group):
    """ This function mirrors the OP's idea. Note the values below are lists """
    return pd.DataFrame({'elapsed_days': [(group.elapsed_time * group.num_cores).sum() / 86400], 
                         'running_days': [(group.running_time * group.num_cores).sum() / 86400]})

user_df = lasts.groupby('user').apply(aggfunc)

Result:

        elapsed_days  running_days
user                              
a    0      3.240741      2.430556
d    0     10.416667      3.819444
s    0      4.629630      1.851852

The second is that the returned dataframe has a hierarchical index (that column of zeros), which can be flattened as shown below:

user_df.index = user_df.index.levels[0]

Result:

      elapsed_days  running_days
user                            
a         3.240741      2.430556
d        10.416667      3.819444
s         4.629630      1.851852

0 讨论(0)

查看其它6个回答