Pandas: get the first occurrence grouping by keys

前端未结

关注

 2  872

If I have following dataframe

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 20


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2021-01-12 13:00
              
            
            
                                                                       
I think you need GroupBy.first:

df.groupby(["id", "id2"])["timestamp"].first()


Or drop_duplicates:

df.drop_duplicates(subset=['id','id2'])


For same output:

df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2021-01-12 13:15
              
            
            
                                                                       
One can create a new column after merging id and id2 strings, then remove rows where it is duplicated: 

df['newcol'] = df.apply(lambda x: str(x.id) + str(x.id2), axis=1)
df = df[~df.newcol.duplicated()].iloc[:,:4]   # iloc used to remove new column.
print(df)


Output:

   id              timestamp  code  id2
0  10   2017-07-12 13:37:00    206   a1
3  10   2017-07-12 19:00:00    206   a2
4  11   2017-07-12 13:37:00    206   a1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复