I have two Pandas dataframes, one quite large (30000+ rows) and one a lot smaller (100+ rows).
The dfA looks something like:
X Y ONSET_TIME
Use merge()
- it works like JOIN
in SQL - and you have first part done.
d1 = ''' X Y ONSET_TIME COLOUR
104 78 1083 6
172 78 1083 16
240 78 1083 15
308 78 1083 8
376 78 1083 8
444 78 1083 14
512 78 1083 14
308 78 3000 14
308 78 2000 14'''
d2 = ''' TIME X Y
7 512 350
1722 512 214
1906 376 214
2095 376 146
2234 308 78
2406 172 146'''
import pandas as pd
from StringIO import StringIO
dfA = pd.DataFrame.from_csv(StringIO(d1), sep='\s+', index_col=None)
#print dfA
dfB = pd.DataFrame.from_csv(StringIO(d2), sep='\s+', index_col=None)
#print dfB
df1 = pd.merge(dfA, dfB, on=['X','Y'])
print df1
result:
X Y ONSET_TIME COLOUR TIME
0 308 78 1083 8 2234
1 308 78 3000 14 2234
2 308 78 2000 14 2234
Then you can use it to filter results.
df2 = df1[ df1['ONSET_TIME'] < df1['TIME'] ]
print df2
result:
X Y ONSET_TIME COLOUR TIME
0 308 78 1083 8 2234
2 308 78 2000 14 2234