Python/Pandas: Drop rows from data frame on string match from list

前端 未结 2 948
野的像风
野的像风 2020-12-14 10:02

I have a .csv file of contact information that I import as a pandas data frame.

>>> import pandas as pd
>>> 
>>> df = pd.read_csv         


        
相关标签:
2条回答
  • 2020-12-14 10:43

    Use isin and pass your list of terms to search for you can then negate the boolean mask using ~ and this will filter out those rows:

    In [6]:
    
    to_drop = ['Clerk', 'Bagger']
    df[~df['title'].isin(to_drop)]
    Out[6]:
      fName  lName             email title
    0  John  Smith  jsmith@gmail.com   CEO
    

    Another method is to join the terms so it becomes a regex and use the vectorised str.contains:

    In [8]:
    
    df[~df['title'].str.contains('|'.join(to_drop))]
    Out[8]:
      fName  lName             email title
    0  John  Smith  jsmith@gmail.com   CEO
    

    IMO it will be easier and probably faster to perform the filtering as a post processing step because if you decide to filter whilst reading then you are iteratively growing the dataframe which is not efficient.

    Alternatively you can read the csv in chunks, filter out the rows you don't want and append the chunks to your output csv

    0 讨论(0)
  • 2020-12-14 10:56

    Another way using query

    In [961]: to_drop = ['Clerk', 'Bagger']
    
    In [962]: df.query('title not in @to_drop')
    Out[962]:
      fName  lName             email title
    0  John  Smith  jsmith@gmail.com   CEO
    
    0 讨论(0)
提交回复
热议问题