merge dataframes with timestamps and intervals

后端 未结 1 2014
没有蜡笔的小新
没有蜡笔的小新 2021-01-07 08:44

I have two dataframes.

df1contains number and timestamps. It is a very large set.

df1.head()
Out[292]: 
2016-08-31 08:09:00     1.0
2016-08-31 08:11:         


        
1条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-07 08:52

    Create IntervalIndex by IntervalIndex.from_arrays first:

    s = pd.IntervalIndex.from_arrays(df2['start'], df2['stop'], 'both')
    print (s)
    IntervalIndex([[2016-08-31 08:09:00, 2016-08-31 08:12:00], 
                   [2016-08-31 08:13:00, 2016-08-31 08:20:00],
                   [2016-08-31 08:20:00, 2016-08-31 08:45:00]],
                  closed='both',
                  dtype='interval[datetime64[ns]]')
    

    Then set_index by new IntervalIndex set to new column by array created by values:

    df1['C'] = df2.set_index(s).loc[df1.index, 'C'].values
    print (df1)
                            A  C
    2016-08-31 08:09:00   1.0  a
    2016-08-31 08:11:00   7.0  a
    2016-08-31 08:14:00  90.0  b
    

    EDIT:

    s = pd.IntervalIndex.from_arrays(df2['start'].astype(np.int64), 
                                     df2['stop'].astype(np.int64), 'both')
    print (s)
    IntervalIndex([[1472630940000000000, 1472631120000000000], 
                   [1472631180000000000, 1472631600000000000], 
                   [1472631600000000000, 1472633100000000000]],
                  closed='both',
                  dtype='interval[int64]')
    
    df1['C'] = df2.set_index(s).loc[df1.index.astype(np.int64), 'C'].values
    print (df1)
                            A  C
    2016-08-31 08:09:00   1.0  a
    2016-08-31 08:11:00   7.0  a
    2016-08-31 08:14:00  90.0  b
    

    0 讨论(0)
提交回复
热议问题