I am trying to join to dataframe on the same column \"Date\", the code is as follow:
import pandas as pd
from datetime import datetime
df_train_csv = pd.read_csv
So let's dissect this:
df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'],index_col='Date')
OK first problem here is you have specified that the index column should be 'Date' this means that you will not have a 'Date' column anymore.
start = datetime(2010, 2, 5)
end = datetime(2012, 10, 26)
df_train_fly = pd.date_range(start, end, freq="W-FRI")
df_train_fly = pd.DataFrame(pd.Series(df_train_fly), columns=['Date'])
merged = df_train_csv.join(df_train_fly.set_index(['Date']), on = ['Date'], how = 'right', lsuffix='_x')
So the above join will not work as the error reported so in order to fix this:
# remove the index_col param
df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'])
# don't set the index on df_train_fly
merged = df_train_csv.join(df_train_fly, on = ['Date'], how = 'right', lsuffix='_x')
OR don't set the 'on' param:
merged = df_train_csv.join(df_train_fly, how = 'right', lsuffix='_x')
the above will use the index of both df's to join on
You can also achieve the same result by performing a merge instead:
merged = df_train_csv.merge(df_train_fly.set_index(['Date']), left_index=True, right_index=True, how = 'right', lsuffix='_x')