pandas, apply multiple functions of multiple columns to groupby object

前端 未结 6 1358
迷失自我
迷失自我 2021-02-13 12:55

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

6条回答
  •  一生所求
    2021-02-13 13:10

    This agg function might be what you're looking for.

    I added an example dataset and applied the operation to a copy of lasts which I named lasts_.

    import pandas as pd
    
    lasts = pd.DataFrame({'user'        :['james','james','james','john','john'],
                          'elapsed_time':[ 200000, 400000, 300000,800000,900000],
                          'running_time':[ 100000, 100000, 200000,600000,700000],
                          'num_cores'   :[      4,      4,      4,     8,     8] })
    
    # create temporary df to add columns to, without modifying original dataframe
    lasts_ = pd.Series.to_frame(lasts.loc[:,'user'])  # using 'user' column to initialize copy of new dataframe.  to_frame gives dataframe instead of series so more columns can be added below
    lasts_['elapsed_days'] = lasts.loc[:,'elapsed_time'] * lasts.loc[:,'num_cores'] / 86400
    lasts_['running_days'] = lasts.loc[:,'running_time'] * lasts.loc[:,'num_cores'] / 86400
    
    # aggregate
    by_user = lasts_.groupby('user').agg({'elapsed_days': 'sum', 
                                          'running_days': 'sum' })
    
    # by_user:
    # user  elapsed_days        running_days
    # james 41.66666666666667   18.51851851851852
    # john  157.4074074074074   120.37037037037037
    

    If you want to keep 'user' as normal column instead of index column, use:

    by_user = lasts_.groupby('user', as_index=False).agg({'elapsed_days': 'sum', 
                                                          'running_days': 'sum'})
    

提交回复
热议问题