Use Pandas groupby() + apply() with arguments

前端 未结 3 2028
一向
一向 2020-12-24 05:48

I would like to use df.groupby() in combination with apply() to apply a function to each row per group.

I normally use the following code,

相关标签:
3条回答
  • 2020-12-24 06:18

    For me

    df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))

    worked

    0 讨论(0)
  • 2020-12-24 06:26

    pandas.core.groupby.GroupBy.apply does NOT have named parameter args, but pandas.DataFrame.apply does have it.

    So try this:

    df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
    

    or as suggested by @Zero:

    df.groupby('columnName').apply(myFunction, ('arg1'))
    

    Demo:

    In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))
    
    In [83]: df
    Out[83]:
       a  b  c
    0  0  3  1
    1  0  3  4
    2  3  0  4
    3  4  2  3
    4  3  4  1
    
    In [84]: def f(ser, n):
        ...:     return ser.max() * n
        ...:
    
    In [85]: df.apply(f, args=(10,))
    Out[85]:
    a    40
    b    40
    c    40
    dtype: int64
    

    when using GroupBy.apply you can pass either a named arguments:

    In [86]: df.groupby('a').apply(f, n=10)
    Out[86]:
        a   b   c
    a
    0   0  30  40
    3  30  40  40
    4  40  20  30
    

    a tuple of arguments:

    In [87]: df.groupby('a').apply(f, (10))
    Out[87]:
        a   b   c
    a
    0   0  30  40
    3  30  40  40
    4  40  20  30
    
    0 讨论(0)
  • 2020-12-24 06:36

    Some confusion here over why using an args parameter throws an error might stem from the fact that pandas.DataFrame.apply does have an args parameter (a tuple), while pandas.core.groupby.GroupBy.apply does not.

    So, when you call .apply on a DataFrame itself, you can use this argument; when you call .apply on a groupby object, you cannot.

    In @MaxU's answer, the expression lambda x: myFunction(x, arg1) is passed to func (the first parameter); there is no need to specify additional *args/**kwargs because arg1 is specified in lambda.

    An example:

    import numpy as np
    import pandas as pd
    
    # Called on DataFrame - `args` is a 1-tuple
    # `0` / `1` are just the axis arguments to np.sum
    df.apply(np.sum, axis=0)  # equiv to df.sum(0)
    df.apply(np.sum, axis=1)  # equiv to df.sum(1)
    
    
    # Called on groupby object of the DataFrame - will throw TypeError
    print(df.groupby('col1').apply(np.sum, args=(0,)))
    # TypeError: sum() got an unexpected keyword argument 'args'
    
    0 讨论(0)
提交回复
热议问题