pandas, apply multiple functions of multiple columns to groupby object

前端 未结 6 1345
迷失自我
迷失自我 2021-02-13 12:55

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

6条回答
  •  野性不改
    2021-02-13 13:13

    To use the agg method on a groupby object by using data from other columns of the same dataframe you could do the following:

    1. Define your functions (lambda functions or not) that take as an input a Series, and get the data from other column(s) using the df.loc[series.index, col] syntax. With this example:

      ed = lambda x: (x * lasts.loc[x.index, "num_cores"]).sum() / 86400. 
      rd = lambda x: (x * lasts.loc[x.index, "num_cores"]).sum() / 86400.
      

      where lasts is the main DataFrame, and we access the data in the column num_cores thanks to the .loc method.

    2. Create a dictionary with these functions and the name for the newly created columns. The keys are the name of the columns on which to apply each function, and the value is another dictionary where the key is the name of the function and the value is the function.

      my_func = {"elapsed_time" : {"elapsed_day" : ed},
                 "running_time" : {"running_days" : rd}}
      
    3. Groupby and aggregate:

      user_df = lasts.groupby("user").agg(my_func)
      user_df
           elapsed_time running_time
            elapsed_day running_days
      user                          
      a        3.240741     2.430556
      d       10.416667     3.819444
      s        4.629630     1.851852
      
    4. If you want to remove the old column names:

       user_df.columns = user_df.columns.droplevel(0)
       user_df
            elapsed_day  running_days
      user                           
      a        3.240741      2.430556
      d       10.416667      3.819444
      s        4.629630      1.851852
      

    HTH

提交回复
热议问题