Apply multiple functions to multiple groupby columns

前端 未结 7 2085
春和景丽
春和景丽 2020-11-22 03:16

The docs show how to apply multiple functions on a groupby object at a time using a dict with the output column names as the keys:

In [563]: grouped[\'D\'].a         


        
7条回答
  •  -上瘾入骨i
    2020-11-22 03:57

    The second half of the currently accepted answer is outdated and has two deprecations. First and most important, you can no longer pass a dictionary of dictionaries to the agg groupby method. Second, never use .ix.

    If you desire to work with two separate columns at the same time I would suggest using the apply method which implicitly passes a DataFrame to the applied function. Let's use a similar dataframe as the one from above

    df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
    df['group'] = [0, 0, 1, 1]
    df
    
              a         b         c         d  group
    0  0.418500  0.030955  0.874869  0.145641      0
    1  0.446069  0.901153  0.095052  0.487040      0
    2  0.843026  0.936169  0.926090  0.041722      1
    3  0.635846  0.439175  0.828787  0.714123      1
    

    A dictionary mapped from column names to aggregation functions is still a perfectly good way to perform an aggregation.

    df.groupby('group').agg({'a':['sum', 'max'], 
                             'b':'mean', 
                             'c':'sum', 
                             'd': lambda x: x.max() - x.min()})
    
                  a                   b         c         d
                sum       max      mean       sum  
    group                                                  
    0      0.864569  0.446069  0.466054  0.969921  0.341399
    1      1.478872  0.843026  0.687672  1.754877  0.672401
    

    If you don't like that ugly lambda column name, you can use a normal function and supply a custom name to the special __name__ attribute like this:

    def max_min(x):
        return x.max() - x.min()
    
    max_min.__name__ = 'Max minus Min'
    
    df.groupby('group').agg({'a':['sum', 'max'], 
                             'b':'mean', 
                             'c':'sum', 
                             'd': max_min})
    
                  a                   b         c             d
                sum       max      mean       sum Max minus Min
    group                                                      
    0      0.864569  0.446069  0.466054  0.969921      0.341399
    1      1.478872  0.843026  0.687672  1.754877      0.672401
    

    Using apply and returning a Series

    Now, if you had multiple columns that needed to interact together then you cannot use agg, which implicitly passes a Series to the aggregating function. When using apply the entire group as a DataFrame gets passed into the function.

    I recommend making a single custom function that returns a Series of all the aggregations. Use the Series index as labels for the new columns:

    def f(x):
        d = {}
        d['a_sum'] = x['a'].sum()
        d['a_max'] = x['a'].max()
        d['b_mean'] = x['b'].mean()
        d['c_d_prodsum'] = (x['c'] * x['d']).sum()
        return pd.Series(d, index=['a_sum', 'a_max', 'b_mean', 'c_d_prodsum'])
    
    df.groupby('group').apply(f)
    
             a_sum     a_max    b_mean  c_d_prodsum
    group                                           
    0      0.864569  0.446069  0.466054     0.173711
    1      1.478872  0.843026  0.687672     0.630494
    

    If you are in love with MultiIndexes, you can still return a Series with one like this:

        def f_mi(x):
            d = []
            d.append(x['a'].sum())
            d.append(x['a'].max())
            d.append(x['b'].mean())
            d.append((x['c'] * x['d']).sum())
            return pd.Series(d, index=[['a', 'a', 'b', 'c_d'], 
                                       ['sum', 'max', 'mean', 'prodsum']])
    
    df.groupby('group').apply(f_mi)
    
                  a                   b       c_d
                sum       max      mean   prodsum
    group                                        
    0      0.864569  0.446069  0.466054  0.173711
    1      1.478872  0.843026  0.687672  0.630494
    

提交回复
热议问题