Pandas: grouping and aggregation with multiple functions

前端 未结 4 1719
南笙
南笙 2021-01-14 04:52

Situation

I have a pandas dataframe defined as follows:

import pandas as pd

headers = [\'Group\', \'Element\', \'Case\', \'Score\', \'Evaluation\'         


        
4条回答
  •  抹茶落季
    2021-01-14 05:11

    Starting from the result data frame, you can transform in two steps as follows to the format you need:

    # collapse multi index column to single level column
    result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]
    ​
    # split the idxmax column into two columns
    result = result.assign(
        max_score_element = result.idxmax_Score.str[0],
        max_score_case = result.idxmax_Score.str[1]
    ).drop('idxmax_Score', 1)
    
    result
    
    #Group  max_Score   min_Evaluation  max_score_case  max_score_element
    #0   A       9.19             0.41               y                  1
    #1   B       9.12             0.10               x                  2
    

    An alternative starting from original df using join, which may not be as efficient but less verbose similar to @tarashypka's idea:

    (df.groupby('Group')
       .agg({'Score': 'idxmax', 'Evaluation': 'min'})
       .set_index('Score')
       .join(df.drop('Evaluation',1))
       .reset_index(drop=True))
    
    #Evaluation  Group  Element   Case  Score
    #0     0.41      A        1      y   9.19
    #1     0.10      B        2      x   9.12
    

    Naive timing with the example data set:

    %%timeit 
    (df.groupby('Group')
     .agg({'Score': 'idxmax', 'Evaluation': 'min'})
     .set_index('Score')
     .join(df.drop('Evaluation',1))
     .reset_index(drop=True))
    # 100 loops, best of 3: 3.47 ms per loop
    
    %%timeit
    result = (
        df.set_index(['Element', 'Case'])
        .groupby('Group')
        .agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
        .reset_index()
    )
    ​
    result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]
    ​
    result = result.assign(
        max_score_element = result.idxmax_Score.str[0],
        max_score_case = result.idxmax_Score.str[1]
    ).drop('idxmax_Score', 1)
    # 100 loops, best of 3: 7.61 ms per loop
    

提交回复
热议问题