get first and last values in a groupby

前端未结

关注

 3  631

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [[\'a\', \'a\', \'a\', \'a\', \'b\', \'b\', \'b\', \'c\


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  傲寒        
                
              
                            
                2020-12-02 23:54
              
            
            
                                                                       
This could be on of the easy solution.

df.groupby(level = 0, as_index= False).nth([0,-1])

      X   Y
a a   0   1
  d   6   7
b e   8   9
  g  12  13
c h  14  15
  i  16  17
d j  18  19


Hope this helps. (Y)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-02 23:58
              
            
            
                                                                       
Please try this:

For last value: df.groupby('Column_name').nth(-1),

For first value: df.groupby('Column_name').nth(0) 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-03 00:04
              
            
            
                                                                       
Option 1
def first_last(df):
    return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)



Option 2 - only works if index is unique
idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
df.loc[idx]


Option 3 - per notes below, this only makes sense when there are no NAs
I also abused the agg function.  The code below works, but is far uglier.
df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
    .set_index('level_1', append=True).reset_index(1, drop=True) \
    .rename_axis([None, None])


Note
per @unutbu: agg(['first', 'last']) take the firs non-na values.
I interpreted this as, it must then be necessary to run this column by column.  Further, forcing index level=1 to align may not even make sense.
Let's include another test
df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [list('aaaabbbccd'),
                   list('abcdefghij')],
                  list('XY'))

df.loc[tuple('aa'), 'X'] = np.nan


def first_last(df):
    return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)


df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
    .set_index('level_1', append=True).reset_index(1, drop=True) \
    .rename_axis([None, None])


Sure enough!  This second solution is taking the first valid value in column X.  It is now nonsensical to have forced that value to align with the index a.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复