I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame
.
I know how to do it in seperate steps:>
To use the agg
method on a groupby
object by using data from other columns of the same dataframe you could do the following:
Define your functions (lambda
functions or not) that take as an input a Series
, and get the data from other column(s) using the df.loc[series.index, col]
syntax. With this example:
ed = lambda x: (x * lasts.loc[x.index, "num_cores"]).sum() / 86400.
rd = lambda x: (x * lasts.loc[x.index, "num_cores"]).sum() / 86400.
where lasts
is the main DataFrame, and we access the data in the column num_cores
thanks to the .loc
method.
Create a dictionary with these functions and the name for the newly created columns. The keys are the name of the columns on which to apply each function, and the value is another dictionary where the key is the name of the function and the value is the function.
my_func = {"elapsed_time" : {"elapsed_day" : ed},
"running_time" : {"running_days" : rd}}
Groupby and aggregate:
user_df = lasts.groupby("user").agg(my_func)
user_df
elapsed_time running_time
elapsed_day running_days
user
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852
If you want to remove the old column names:
user_df.columns = user_df.columns.droplevel(0)
user_df
elapsed_day running_days
user
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852
HTH