pandas, apply multiple functions of multiple columns to groupby object

前端未结

关注

 6  1358

迷失自我 2021-02-13 12:55

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

6条回答

一生所求 (楼主)

2021-02-13 13:10

This agg function might be what you're looking for.

I added an example dataset and applied the operation to a copy of lasts which I named lasts_.

import pandas as pd

lasts = pd.DataFrame({'user'        :['james','james','james','john','john'],
                      'elapsed_time':[ 200000, 400000, 300000,800000,900000],
                      'running_time':[ 100000, 100000, 200000,600000,700000],
                      'num_cores'   :[      4,      4,      4,     8,     8] })

# create temporary df to add columns to, without modifying original dataframe
lasts_ = pd.Series.to_frame(lasts.loc[:,'user'])  # using 'user' column to initialize copy of new dataframe.  to_frame gives dataframe instead of series so more columns can be added below
lasts_['elapsed_days'] = lasts.loc[:,'elapsed_time'] * lasts.loc[:,'num_cores'] / 86400
lasts_['running_days'] = lasts.loc[:,'running_time'] * lasts.loc[:,'num_cores'] / 86400

# aggregate
by_user = lasts_.groupby('user').agg({'elapsed_days': 'sum', 
                                      'running_days': 'sum' })

# by_user:
# user  elapsed_days        running_days
# james 41.66666666666667   18.51851851851852
# john  157.4074074074074   120.37037037037037

If you want to keep 'user' as normal column instead of index column, use:

by_user = lasts_.groupby('user', as_index=False).agg({'elapsed_days': 'sum', 
                                                      'running_days': 'sum'})

0 讨论(0)

查看其它6个回答