Given two dataframes df_1
and df_2
, how to join them such that datetime column df_1
is in between start
and end
In this method, we assume TimeStamp objects are used.
df2 start end event
0 2016-05-14 10:54:31 2016-05-14 10:54:33 E1
1 2016-05-14 10:54:34 2016-05-14 10:54:37 E2
2 2016-05-14 10:54:38 2016-05-14 10:54:42 E3
event_num = len(df2.event)
def get_event(t):
event_idx = ((t >= df2.start) & (t <= df2.end)).dot(np.arange(event_num))
return df2.event[event_idx]
df1["event"] = df1.timestamp.transform(get_event)
Explanation of get_event
For each timestamp in df1
, say t0 = 2016-05-14 10:54:33
,
(t0 >= df2.start) & (t0 <= df2.end)
will contain 1 true. (See example 1). Then, take a dot product with np.arange(event_num)
to get the index of the event that a t0
belongs to.
Examples:
Example 1
t0 >= df2.start t0 <= df2.end After & np.arange(3)
0 True True -> T 0 event_idx
1 False True -> F 1 -> 0
2 False True -> F 2
Take t2 = 2016-05-14 10:54:35
for another example
t2 >= df2.start t2 <= df2.end After & np.arange(3)
0 True False -> F 0 event_idx
1 True True -> T 1 -> 1
2 False True -> F 2
We finally use transform
to transform each timestamp into an event.