I often need to filter pandas dataframe df
by df[df[\'col_name\']==\'string_value\']
, and I want to speed up the row selction operation, is there a qui
I have long wanted to add binary search indexes to DataFrame objects. You can take the DIY approach of sorting by the column and doing this yourself:
In [11]: df = df.sort('STK_ID') # skip this if you're sure it's sorted
In [12]: df['STK_ID'].searchsorted('A0003', 'left')
Out[12]: 6000
In [13]: df['STK_ID'].searchsorted('A0003', 'right')
Out[13]: 8000
In [14]: timeit df[6000:8000]
10000 loops, best of 3: 134 µs per loop
This is fast because it always retrieves views and does not copy any data.