select largest N of a column of each groupby group using pandas

前端未结

关注

 1  764

谎友^ 2020-12-19 18:37

My df:

{\'city1\': {0: \'Chicago\',
  1: \'Chicago\',
  2: \'Chicago\',
  3: \'Chicago\',
  4: \'Miami\',
  5: \'Houston\',
  6: \'Austin\'},
 \'city2\': {0:


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   隐瞒了意图╮
                                             
                
                
                (楼主)
            
              
              
                2020-12-19 19:16
              

            
            
                        
I added an issue here

Theory:


  When the results of a groupby on a pd.Series returns the same pd.Series values, then the original index is returned.


Boiled down example

df = pd.DataFrame(dict(A=[0, 1, 2, 3]))

# returns results identical to df.A
print(df.groupby(df.A // 2).A.nsmallest(2))

# returns results out of order
print(df.groupby(df.A // 2).A.nlargest(2))

0    0
1    1
2    2
3    3
Name: A, dtype: int64
A   
0  1    1
   0    0
1  3    3
   2    2
Name: A, dtype: int64


I'd argue that you want these to return the same consistent index.

This is the most egregious consequence of this:

# most egregious
# this will be randomly different
print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=2))




returns this on one execution

A   
0  1    1
   0    0
1  2    2
   3    3
Name: A, dtype: int64


And this on another

0    0
1    1
2    2
3    3
Name: A, dtype: int64


Of course this never has an issue because it's impossible to return the same values as the original 

print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=1))

A   
0  0    0
1  2    2
Name: A, dtype: int64




Work around

set_index

cols = ['plant1_type','plant2_type','city2']
df.set_index(cols).groupby(level=cols)['p234_r_c'].\
    nlargest(1).reset_index()

  plant1_type plant2_type     city2  p234_r_c
0    COMBCYCL        COAL   Toronto       5.0
1    COMBCYCL        COAL   Detroit       4.0
2        NUKE    COMBCYCL  St.Louis       2.0
3        COAL    COMBCYCL     Miami       0.5
4        NUKE        COAL    Dallas       1.0
5    COMBCYCL        NUKE    Dallas       4.0
6        COAL        NUKE    Dallas       3.0

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复