How to join two dataframes for which column values are within a certain range?

后端 未结 5 790
清酒与你
清酒与你 2020-11-22 07:53

Given two dataframes df_1 and df_2, how to join them such that datetime column df_1 is in between start and end

5条回答
  •  灰色年华
    2020-11-22 08:54

    First use IntervalIndex to create a reference index based on the interval of interest, then use get_indexer to slice the dataframe which contains the discrete events of interest.

    idx = pd.IntervalIndex.from_arrays(df_2['start'], df_2['end'], closed='both')
    event = df_2.iloc[idx.get_indexer(df_1.timestamp), 'event']
    
    event
    0    E1
    1    E2
    1    E2
    1    E2
    2    E3
    Name: event, dtype: object
    
    df_1['event'] = event.to_numpy()
    df_1
                timestamp         A         B event
    0 2016-05-14 10:54:33  0.020228  0.026572    E1
    1 2016-05-14 10:54:34  0.057780  0.175499    E2
    2 2016-05-14 10:54:35  0.098808  0.620986    E2
    3 2016-05-14 10:54:36  0.158789  1.014819    E2
    4 2016-05-14 10:54:39  0.038129  2.384590    E3
    

    Reference: A question on IntervalIndex.get_indexer.

提交回复
热议问题