Python Pandas join dataframes on index

前端 未结 1 1555
余生分开走
余生分开走 2021-02-12 12:56

I am trying to join to dataframe on the same column \"Date\", the code is as follow:

import pandas as pd
from datetime import datetime
df_train_csv = pd.read_csv         


        
1条回答
  •  感动是毒
    2021-02-12 13:29

    So let's dissect this:

    df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'],index_col='Date')
    

    OK first problem here is you have specified that the index column should be 'Date' this means that you will not have a 'Date' column anymore.

    start = datetime(2010, 2, 5)
    end = datetime(2012, 10, 26)
    
    df_train_fly = pd.date_range(start, end, freq="W-FRI")
    df_train_fly = pd.DataFrame(pd.Series(df_train_fly), columns=['Date'])
    
    merged = df_train_csv.join(df_train_fly.set_index(['Date']), on = ['Date'], how = 'right', lsuffix='_x')
    

    So the above join will not work as the error reported so in order to fix this:

    # remove the index_col param
    df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'])
    # don't set the index on df_train_fly
    merged = df_train_csv.join(df_train_fly, on = ['Date'], how = 'right', lsuffix='_x')
    

    OR don't set the 'on' param:

    merged = df_train_csv.join(df_train_fly, how = 'right', lsuffix='_x')
    

    the above will use the index of both df's to join on

    You can also achieve the same result by performing a merge instead:

    merged = df_train_csv.merge(df_train_fly.set_index(['Date']), left_index=True, right_index=True, how = 'right', lsuffix='_x')
    

    0 讨论(0)
提交回复
热议问题