Pandas merge on `datetime` or `datetime` in `datetimeIndex`

后端 未结 2 1516
[愿得一人]
[愿得一人] 2021-01-18 00:56

Currently I have two data frames representing excel spreadsheets. I wish to join the data where the dates are equal. This is a one to many join as one spread sheet has a dat

相关标签:
2条回答
  • 2021-01-18 01:17

    So here's the option with merging:

    Assume you have two DataFrames:

    import pandas as pd
    df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'], 
                        'data': ['A', 'B', 'C']})
    df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'], 
                        'data': ['E', 'F', 'G']})
    

    Now do some cleaning to get all of the dates you need and make sure they are datetime

    df1['date'] = pd.to_datetime(df1.date)
    
    df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
    df2['start'] = pd.to_datetime(df2.start)
    df2['end'] = pd.to_datetime(df2.end)
    # No need for this anymore
    df2 = df2.drop(columns='date')
    

    Now merge it all together. You'll get 99x10K rows.

    df = df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy').drop(columns='dummy')
    

    And subset to the dates that fall in between the ranges:

    df[(df.date >= df.start) & (df.date <= df.end)]
    #        date data_x data_y      start        end
    #0 2015-01-01      A      E 2015-01-01 2015-01-02
    #1 2015-01-01      A      F 2015-01-01 2015-01-02
    #3 2015-01-02      B      E 2015-01-01 2015-01-02
    #4 2015-01-02      B      F 2015-01-01 2015-01-02
    #5 2015-01-02      B      G 2015-01-02 2015-01-03
    #8 2015-01-03      C      G 2015-01-02 2015-01-03
    

    If for instance, some dates in df2 were a single date, since we're using .str.split we will get None for the second date. Then just use .loc to set it appropriately.

    df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03',
                                 '2015-01-03'], 
                        'data': ['E', 'F', 'G', 'H']})
    
    df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
    df2.loc[df2.end.isnull(), 'end'] = df2.loc[df2.end.isnull(), 'start']
    #  data      start        end
    #0    E 2015-01-01 2015-01-02
    #1    F 2015-01-01 2015-01-02
    #2    G 2015-01-02 2015-01-03
    #3    H 2015-01-03 2015-01-03
    

    Now the rest follows unchanged

    0 讨论(0)
  • 2021-01-18 01:26

    Let's use this numpy method by @piRSquared:

    df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'], 
                        'data': ['A', 'B', 'C']})
    df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'], 
                        'data': ['E', 'F', 'G']})
    
    df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
    df2['start'] = pd.to_datetime(df2.start)
    df2['end'] = pd.to_datetime(df2.end)
    df1['date'] = pd.to_datetime(df1['date'])
    
    a = df1['date'].values
    bh = df2['end'].values
    bl = df2['start'].values
    
    i, j = np.where((a[:, None] >= bl) & (a[:, None] <= bh))
    
    pd.DataFrame(np.column_stack([df1.values[i], df2.values[j]]),
                 columns=df1.columns.append(df2.columns))
    

    Output:

                      date data                      date data                start                  end
    0  2015-01-01 00:00:00    A  2015-01-01 to 2015-01-02    E  2015-01-01 00:00:00  2015-01-02 00:00:00
    1  2015-01-01 00:00:00    A  2015-01-01 to 2015-01-02    F  2015-01-01 00:00:00  2015-01-02 00:00:00
    2  2015-01-02 00:00:00    B  2015-01-01 to 2015-01-02    E  2015-01-01 00:00:00  2015-01-02 00:00:00
    3  2015-01-02 00:00:00    B  2015-01-01 to 2015-01-02    F  2015-01-01 00:00:00  2015-01-02 00:00:00
    4  2015-01-02 00:00:00    B  2015-01-02 to 2015-01-03    G  2015-01-02 00:00:00  2015-01-03 00:00:00
    5  2015-01-03 00:00:00    C  2015-01-02 to 2015-01-03    G  2015-01-02 00:00:00  2015-01-03 00:00:00
    
    0 讨论(0)
提交回复
热议问题