Python/Pandas: Drop rows from data frame on string match from list

前端未结

关注

 2  948

I have a .csv file of contact information that I import as a pandas data frame.

>>> import pandas as pd
>>> 
>>> df = pd.read_csv


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2020-12-14 10:43
              
            
            
                                                                       
Use isin and pass your list of terms to search for you can then negate the boolean mask using ~ and this will filter out those rows:

In [6]:

to_drop = ['Clerk', 'Bagger']
df[~df['title'].isin(to_drop)]
Out[6]:
  fName  lName             email title
0  John  Smith  jsmith@gmail.com   CEO


Another method is to join the terms so it becomes a regex and use the vectorised str.contains:

In [8]:

df[~df['title'].str.contains('|'.join(to_drop))]
Out[8]:
  fName  lName             email title
0  John  Smith  jsmith@gmail.com   CEO


IMO it will be easier and probably faster to perform the filtering as a post processing step because if you decide to filter whilst reading then you are iteratively growing the dataframe which is not efficient.

Alternatively you can read the csv in chunks, filter out the rows you don't want and append the chunks to your output csv
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2020-12-14 10:56
              
            
            
                                                                       
Another way using query

In [961]: to_drop = ['Clerk', 'Bagger']

In [962]: df.query('title not in @to_drop')
Out[962]:
  fName  lName             email title
0  John  Smith  jsmith@gmail.com   CEO

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复