I have two dataframes, one with news and the other with stock price. Both the dataframes have a \"Date\" column. I want to merge them on a gap of 5 days.
Lets say my new
Here is my solution using numpy
df_n = pd.DataFrame([('2018-09-29', 'Huge blow to ABC Corp. as they lost the 2012 tax case'), ('2018-09-30', 'ABC Corp. suffers a loss'), ('2018-10-01', 'ABC Corp to Sell stakes'), ('2018-12-20', 'We are going to comeback strong said ABC CEO'), ('2018-12-22', 'Shares are down massively for ABC Corp.')], columns=('News_Dates', 'News'))
df1_zscore = pd.DataFrame([('2018-10-04', '120'), ('2018-12-24', '131')], columns=('Dates', 'Price'))
df_n["News_Dates"] = pd.to_datetime(df_n["News_Dates"])
df1_zscore["Dates"] = pd.to_datetime(df1_zscore["Dates"])
n_dates = df_n["News_Dates"].values
p_dates = df1_zscore[["Dates"]].values
## substract each pair of n_dates and p_dates and create a matrix
mat_date_compare = (p_dates - n_dates).astype('timedelta64[D]')
## get matrix of boolean for which difference is between 0 and 5 day
## to be used as index for original array
comparision = (mat_date_compare <= pd.Timedelta("5d")) & (mat_date_compare >= pd.Timedelta("0d"))
## get cell numbers which is in range 0 to matrix size which meets the condition
ind = np.arange(len(n_dates)*len(p_dates))[comparision.ravel()]
## calculate row and column index from cell number to index the df
pd.concat([df1_zscore.iloc[ind//len(n_dates)].reset_index(drop=True),
df_n.iloc[ind%len(n_dates)].reset_index(drop=True)], sort=False, axis=1)
Result
Dates Price News_Dates News
0 2018-10-04 120 2018-09-29 Huge blow to ABC Corp. as they lost the 2012 t...
1 2018-10-04 120 2018-09-30 ABC Corp. suffers a loss
2 2018-10-04 120 2018-10-01 ABC Corp to Sell stakes
3 2018-12-24 131 2018-12-20 We are going to comeback strong said ABC CEO
4 2018-12-24 131 2018-12-22 Shares are down massively for ABC Corp.