Apply multiple functions to multiple groupby columns

前端 未结 7 2076
春和景丽
春和景丽 2020-11-22 03:16

The docs show how to apply multiple functions on a groupby object at a time using a dict with the output column names as the keys:

In [563]: grouped[\'D\'].a         


        
7条回答
  •  感情败类
    2020-11-22 04:08

    Ted's answer is amazing. I ended up using a smaller version of that in case anyone is interested. Useful when you are looking for one aggregation that depends on values from multiple columns:

    create a dataframe

    df=pd.DataFrame({'a': [1,2,3,4,5,6], 'b': [1,1,0,1,1,0], 'c': ['x','x','y','y','z','z']})
    
    
       a  b  c
    0  1  1  x
    1  2  1  x
    2  3  0  y
    3  4  1  y
    4  5  1  z
    5  6  0  z
    

    grouping and aggregating with apply (using multiple columns)

    df.groupby('c').apply(lambda x: x['a'][(x['a']>1) & (x['b']==1)].mean())
    
    c
    x    2.0
    y    4.0
    z    5.0
    

    grouping and aggregating with aggregate (using multiple columns)

    I like this approach since I can still use aggregate. Perhaps people will let me know why apply is needed for getting at multiple columns when doing aggregations on groups.

    It seems obvious now, but as long as you don't select the column of interest directly after the groupby, you will have access to all the columns of the dataframe from within your aggregation function.

    only access to the selected column

    df.groupby('c')['a'].aggregate(lambda x: x[x>1].mean())
    

    access to all columns since selection is after all the magic

    df.groupby('c').aggregate(lambda x: x[(x['a']>1) & (x['b']==1)].mean())['a']
    

    or similarly

    df.groupby('c').aggregate(lambda x: x['a'][(x['a']>1) & (x['b']==1)].mean())
    

    I hope this helps.

提交回复
热议问题