I have a pandas frame similar to this one:
import pandas as pd
import numpy as np
data = {\'Col1\' : [4,5,6,7], \'Col2\' : [10,20,30,40], \'Col3\' : [100,5
This should do it:
df.loc[df.Col4.isin(target_array)].index
EDIT:
I ran three options: from selected answers. Mine, Bruce Pucci, and Divakar
Divakars was faster by a large amount. I'd pick his.
For the sake of completeness I've added two (.query()
variants) - my timings against 400K rows df:
In [63]: df.shape
Out[63]: (400000, 4)
In [64]: %timeit df.index[np.in1d(df['Col4'],target_array)]
10 loops, best of 3: 35.1 ms per loop
In [65]: %timeit df.index[df.Col4.isin(target_array)]
10 loops, best of 3: 36.7 ms per loop
In [66]: %timeit df.loc[df.Col4.isin(target_array)].index
10 loops, best of 3: 47.8 ms per loop
In [67]: %timeit df.query('@target_array.tolist() == Col4')
10 loops, best of 3: 45.7 ms per loop
In [68]: %timeit df.query('@target_array in Col4')
10 loops, best of 3: 51.9 ms per loop
Here is a similar comparison for (not in ...) and for different dtypes
df.index[df.Col4.isin(target_array)]
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
target_array = np.array(['AAA', 'CCC', 'EEE'])
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
df['in_col'] = df['Col4'].apply(lambda x: x in target_array)
Is this what you were looking for? Then you can groupby the new column and query the True elements.
You can use NumPy's in1d -
df.index[np.in1d(df['Col4'],target_array)]
Explanation
1) Create a 1D
mask corresponding to each row telling us whether there is a match between col4's
element and any element in target_array
:
mask = np.in1d(df['Col4'],target_array)
2) Use the mask to select valid indices from the dataframe as final output :
out = df.index[np.in1d(df['Col4'],target_array)]