I want to filter a pandas dataframe, if the name column entry has an item in a given list.
Here we have a DataFrame
x = DataFrame(
[[\'sam\', 328], [
Try using isin (thanks to DSM for suggesting loc
over ix
here):
In [78]: x = pd.DataFrame([['sam',328],['ruby',3213],['jon',121]], columns = ['name', 'score'])
In [79]: names = ['sam', 'ruby']
In [80]: x['name'].isin(names)
Out[80]:
0 True
1 True
2 False
Name: name, dtype: bool
In [81]: x.loc[x['name'].isin(names), 'score'].sum()
Out[81]: 3541
CT Zhu suggests a faster alternative using np.in1d
:
In [105]: y = pd.concat([x]*1000)
In [109]: %timeit y.loc[y['name'].isin(names), 'score'].sum()
1000 loops, best of 3: 413 µs per loop
In [110]: %timeit y.loc[np.in1d(y['name'], names), 'score'].sum()
1000 loops, best of 3: 335 µs per loop