Finding an array elements location in a pandas frame column (a.k.a pd.series)

前端 未结 5 1630
离开以前
离开以前 2021-01-12 02:27

I have a pandas frame similar to this one:

import pandas as pd
import numpy as np

data = {\'Col1\' : [4,5,6,7], \'Col2\' : [10,20,30,40], \'Col3\' : [100,5         


        
相关标签:
5条回答
  • 2021-01-12 02:58

    This should do it:

    df.loc[df.Col4.isin(target_array)].index
    

    EDIT:

    I ran three options: from selected answers. Mine, Bruce Pucci, and Divakar

    Divakars was faster by a large amount. I'd pick his.

    0 讨论(0)
  • 2021-01-12 03:09

    For the sake of completeness I've added two (.query() variants) - my timings against 400K rows df:

    In [63]: df.shape
    Out[63]: (400000, 4)
    
    In [64]:  %timeit df.index[np.in1d(df['Col4'],target_array)]
    10 loops, best of 3: 35.1 ms per loop
    
    In [65]: %timeit df.index[df.Col4.isin(target_array)]
    10 loops, best of 3: 36.7 ms per loop
    
    In [66]: %timeit df.loc[df.Col4.isin(target_array)].index
    10 loops, best of 3: 47.8 ms per loop
    
    In [67]: %timeit df.query('@target_array.tolist() == Col4')
    10 loops, best of 3: 45.7 ms per loop
    
    In [68]: %timeit df.query('@target_array in Col4')
    10 loops, best of 3: 51.9 ms per loop
    

    Here is a similar comparison for (not in ...) and for different dtypes

    0 讨论(0)
  • 2021-01-12 03:09
    df.index[df.Col4.isin(target_array)]
    
    0 讨论(0)
  • 2021-01-12 03:12
    import pandas as pd
    import numpy as np
    
    data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
    target_array = np.array(['AAA', 'CCC', 'EEE'])
    
    df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
    
    df['in_col'] = df['Col4'].apply(lambda x: x in target_array)
    

    Is this what you were looking for? Then you can groupby the new column and query the True elements.

    0 讨论(0)
  • 2021-01-12 03:15

    You can use NumPy's in1d -

    df.index[np.in1d(df['Col4'],target_array)]
    

    Explanation

    1) Create a 1D mask corresponding to each row telling us whether there is a match between col4's element and any element in target_array :

    mask = np.in1d(df['Col4'],target_array)
    

    2) Use the mask to select valid indices from the dataframe as final output :

    out = df.index[np.in1d(df['Col4'],target_array)]
    
    0 讨论(0)
提交回复
热议问题