Pandas: grouping and aggregation with multiple functions

前端未结

关注

 4  1719

南笙 2021-01-14 04:52

Situation

I have a pandas dataframe defined as follows:

import pandas as pd

headers = [\'Group\', \'Element\', \'Case\', \'Score\', \'Evaluation\'


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   抹茶落季
                                             
                
                
                (楼主)
            
              
              
                2021-01-14 05:11
              

            
            
                        
Starting from the result data frame, you can transform in two steps as follows to the format you need:

# collapse multi index column to single level column
result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]

# split the idxmax column into two columns
result = result.assign(
    max_score_element = result.idxmax_Score.str[0],
    max_score_case = result.idxmax_Score.str[1]
).drop('idxmax_Score', 1)

result

#Group  max_Score   min_Evaluation  max_score_case  max_score_element
#0   A       9.19             0.41               y                  1
#1   B       9.12             0.10               x                  2




An alternative starting from original df using join, which may not be as efficient but less verbose similar to @tarashypka's idea:

(df.groupby('Group')
   .agg({'Score': 'idxmax', 'Evaluation': 'min'})
   .set_index('Score')
   .join(df.drop('Evaluation',1))
   .reset_index(drop=True))

#Evaluation  Group  Element   Case  Score
#0     0.41      A        1      y   9.19
#1     0.10      B        2      x   9.12




Naive timing with the example data set:

%%timeit 
(df.groupby('Group')
 .agg({'Score': 'idxmax', 'Evaluation': 'min'})
 .set_index('Score')
 .join(df.drop('Evaluation',1))
 .reset_index(drop=True))
# 100 loops, best of 3: 3.47 ms per loop

%%timeit
result = (
    df.set_index(['Element', 'Case'])
    .groupby('Group')
    .agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
    .reset_index()
)

result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]

result = result.assign(
    max_score_element = result.idxmax_Score.str[0],
    max_score_case = result.idxmax_Score.str[1]
).drop('idxmax_Score', 1)
# 100 loops, best of 3: 7.61 ms per loop

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复