applying several functions in transform in pandas

后端 未结 3 734
执念已碎
执念已碎 2021-01-14 10:17

After a groupby, when using agg, if a dict of columns:functions is passed, the functions will be applied in the corresponding columns.

相关标签:
3条回答
  • 2021-01-14 10:49

    I think now (pandas 0.20.2) function transform is not implemented with dict - columns names with functions like agg.

    If functions return Series with same lenght:

    df1 = df_test.set_index('a').groupby('a').agg({'b':np.cumsum,'c':np.cumprod}).reset_index()
    print (df1)
       a     c   b
    0  1     3   2
    1  1    90  22
    2  2    50  30
    3  1  2970  24
    4  2  2500  34
    

    But if aggreagte different length need join:

    df2 = df_test[['a']].join(df_test.groupby('a').agg({'b':my_fct1,'c':my_fct2}), on='a')
    print (df2)
       a          c   b
    0  1  16.522712   8
    1  1  16.522712   8
    2  2   0.000000  17
    3  1  16.522712   8
    4  2   0.000000  17
    
    0 讨论(0)
  • 2021-01-14 11:13

    With the updates to Pandas, you can use the assign method, along with transform to either append new columns, or replace existing columns with new values :

    grouper = df_test.groupby("a")
    
    df_test.assign(b=grouper["b"].transform("cumsum"), 
                   c=grouper["c"].transform("cumprod"))
    
        a   b   c
    0   1   2   3
    1   1   22  90
    2   2   30  50
    3   1   24  2970
    4   2   34  2500
    
    0 讨论(0)
  • 2021-01-14 11:15

    You can still use a dict but with a bit of hack:

    df_test.groupby('a').transform(lambda x: {'b': x.cumsum(), 'c': x.cumprod()}[x.name])
    Out[427]: 
        b     c
    0   2     3
    1  22    90
    2  30    50
    3  24  2970
    4  34  2500
    

    If you need to keep column a, you can do:

    df_test.set_index('a')\
           .groupby('a')\
           .transform(lambda x: {'b': x.cumsum(), 'c': x.cumprod()}[x.name])\
           .reset_index()
    Out[429]: 
       a   b     c
    0  1   2     3
    1  1  22    90
    2  2  30    50
    3  1  24  2970
    4  2  34  2500
    

    Another way is to use an if else to check column names:

    df_test.set_index('a')\
           .groupby('a')\
           .transform(lambda x: x.cumsum() if x.name=='b' else x.cumprod())\
           .reset_index()
    
    0 讨论(0)
提交回复
热议问题