pandas, apply multiple functions of multiple columns to groupby object

前端 未结 6 1347
迷失自我
迷失自我 2021-02-13 12:55

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

6条回答
  •  既然无缘
    2021-02-13 13:15

    I think you can avoid agg or apply and rather first multiple by mul, then div and last use groupby by index with aggregating sum:

    lasts = pd.DataFrame({'user':['a','s','d','d'],
                       'elapsed_time':[40000,50000,60000,90000],
                       'running_time':[30000,20000,30000,15000],
                       'num_cores':[7,8,9,4]})
    
    print (lasts)
       elapsed_time  num_cores  running_time user
    0         40000          7         30000    a
    1         50000          8         20000    s
    2         60000          9         30000    d
    3         90000          4         15000    d
    
    by_user = lasts.groupby('user')
    elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400)
    print (elapsed_days)
    running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400)
    user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days'))
    print (user_df)
          elapsed_days  running_days
    user                            
    a         3.240741      2.430556
    d        10.416667      3.819444
    s         4.629630      1.851852
    
    lasts = lasts.set_index('user')
    print (lasts[['elapsed_time','running_time']].mul(lasts['num_cores'], axis=0)
                                                 .div(86400)
                                                 .groupby(level=0)
                                                 .sum())
          elapsed_time  running_time
    user                            
    a         3.240741      2.430556
    d        10.416667      3.819444
    s         4.629630      1.851852   
    

提交回复
热议问题