How to speed up pandas row filtering by string matching?

前端 未结 3 1254
-上瘾入骨i
-上瘾入骨i 2021-01-31 22:47

I often need to filter pandas dataframe df by df[df[\'col_name\']==\'string_value\'], and I want to speed up the row selction operation, is there a qui

3条回答
  •  孤街浪徒
    2021-01-31 23:02

    I have long wanted to add binary search indexes to DataFrame objects. You can take the DIY approach of sorting by the column and doing this yourself:

    In [11]: df = df.sort('STK_ID') # skip this if you're sure it's sorted
    
    In [12]: df['STK_ID'].searchsorted('A0003', 'left')
    Out[12]: 6000
    
    In [13]: df['STK_ID'].searchsorted('A0003', 'right')
    Out[13]: 8000
    
    In [14]: timeit df[6000:8000]
    10000 loops, best of 3: 134 µs per loop
    

    This is fast because it always retrieves views and does not copy any data.

提交回复
热议问题