I have a .csv file of contact information that I import as a pandas data frame.
>>> import pandas as pd
>>>
>>> df = pd.read_csv
Use isin and pass your list of terms to search for you can then negate the boolean mask using ~
and this will filter out those rows:
In [6]:
to_drop = ['Clerk', 'Bagger']
df[~df['title'].isin(to_drop)]
Out[6]:
fName lName email title
0 John Smith jsmith@gmail.com CEO
Another method is to join the terms so it becomes a regex and use the vectorised str.contains:
In [8]:
df[~df['title'].str.contains('|'.join(to_drop))]
Out[8]:
fName lName email title
0 John Smith jsmith@gmail.com CEO
IMO it will be easier and probably faster to perform the filtering as a post processing step because if you decide to filter whilst reading then you are iteratively growing the dataframe which is not efficient.
Alternatively you can read the csv in chunks, filter out the rows you don't want and append the chunks to your output csv
Another way using query
In [961]: to_drop = ['Clerk', 'Bagger']
In [962]: df.query('title not in @to_drop')
Out[962]:
fName lName email title
0 John Smith jsmith@gmail.com CEO