I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame
.
I know how to do it in seperate steps:>
Here is a solution which closely resembles the original idea expressed under "I suspect there is a better way".
I'll use the same testing data as the other answers:
lasts = pd.DataFrame({'user':['a','s','d','d'],
'elapsed_time':[40000,50000,60000,90000],
'running_time':[30000,20000,30000,15000],
'num_cores':[7,8,9,4]})
groupby.apply
can accept a function which returns a dataframe and will then automatically stitch the returned dataframes together. There are two small catches in the wording below. The first is noticing that the values passed to DataFrame
are in fact single-element lists instead of just numbers.
def aggfunc(group):
""" This function mirrors the OP's idea. Note the values below are lists """
return pd.DataFrame({'elapsed_days': [(group.elapsed_time * group.num_cores).sum() / 86400],
'running_days': [(group.running_time * group.num_cores).sum() / 86400]})
user_df = lasts.groupby('user').apply(aggfunc)
Result:
elapsed_days running_days
user
a 0 3.240741 2.430556
d 0 10.416667 3.819444
s 0 4.629630 1.851852
The second is that the returned dataframe has a hierarchical index (that column of zeros), which can be flattened as shown below:
user_df.index = user_df.index.levels[0]
Result:
elapsed_days running_days
user
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852