How do I drop duplicates and keep the first value on pandas?

后端未结

关注

 3  676

野性不改 2021-01-27 11:24

I want to drop duplicates and keep the first value. The duplicates that want to be dropped is A = \'df\' .Here\'s my data

A   B   C   D   E
qw  1   3   1   1
er


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   悲哀的现实
                                             
                
                
                (楼主)
            
              
              
                2021-01-27 11:42
              

            
            
                        
Another idea, with the benefit of being more readable in my opinion, would be to only shift the index where df.A == "df" and store the ids where the differences are equal to 1. These columns we drop with df.drop().

idx = df[df.A == "df"].index             # [3, 4, 5, 6, 9, 10]
m = idx - np.roll(idx, 1) == 1           # [False, True, True, True, False, True]
df.drop(idx[m], inplace = True)          # [4,5,6,10]                <-- These we drop




Time comparison

Runs equally fast as jezrael using the test sample below. 


  1000 loops, best of 3: 1.38 ms per loop  
  
  1000 loops, best of 3: 1.38 ms per loop




Full example

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'A': {0: 'qw', 1: 'er', 2: 'ew', 3: 'df', 4: 'df', 5: 'df', 6: 'df', 7: 'we', 
            8: 'we', 9: 'df', 10: 'df', 11: 'we', 12: 'qw'}, 
    'B': {0: 1, 1: 2, 2: 4, 3: 34, 4: 2, 5: 3, 6: 4, 7: 2, 8: 4, 9: 34, 10: 3, 
          11: 4, 12: 2}, 
    'C': {0: 3, 1: 4, 2: 8, 3: 34, 4: 5, 5: 3, 6: 4, 7: 5, 8: 4, 9: 9, 10: 3, 
          11: 7, 12: 2}, 
    'D': {0: 1, 1: 2, 2: 44, 3: 34, 4: 2, 5: 7, 6: 7, 7: 5, 8: 4, 9: 34, 10: 9, 
          11: 4, 12: 7}, 
    'E': {0: 1, 1: 6, 2: 4, 3: 34, 4: 2, 5: 3, 6: 4, 7: 2, 8: 4, 9: 34, 10: 3, 
          11: 4, 12: 2}}
)

idx = df[df.A == "df"].index
m = idx - np.roll(idx, 1) == 1
df.drop(idx[m], inplace = True)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复