I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame
.
I know how to do it in seperate steps:>
This agg function might be what you're looking for.
I added an example dataset and applied the operation to a copy of lasts
which I named lasts_
.
import pandas as pd
lasts = pd.DataFrame({'user' :['james','james','james','john','john'],
'elapsed_time':[ 200000, 400000, 300000,800000,900000],
'running_time':[ 100000, 100000, 200000,600000,700000],
'num_cores' :[ 4, 4, 4, 8, 8] })
# create temporary df to add columns to, without modifying original dataframe
lasts_ = pd.Series.to_frame(lasts.loc[:,'user']) # using 'user' column to initialize copy of new dataframe. to_frame gives dataframe instead of series so more columns can be added below
lasts_['elapsed_days'] = lasts.loc[:,'elapsed_time'] * lasts.loc[:,'num_cores'] / 86400
lasts_['running_days'] = lasts.loc[:,'running_time'] * lasts.loc[:,'num_cores'] / 86400
# aggregate
by_user = lasts_.groupby('user').agg({'elapsed_days': 'sum',
'running_days': 'sum' })
# by_user:
# user elapsed_days running_days
# james 41.66666666666667 18.51851851851852
# john 157.4074074074074 120.37037037037037
If you want to keep 'user' as normal column instead of index column, use:
by_user = lasts_.groupby('user', as_index=False).agg({'elapsed_days': 'sum',
'running_days': 'sum'})