问题
I have a dataframe that has dates, assets, and then price/volume data. I'm trying to pull in data from 7 days ago, but the issue is that I can't use shift() because my table has missing dates in it.
date cusip price price_7daysago
1/1/2017 a 1
1/1/2017 b 2
1/2/2017 a 1.2
1/2/2017 b 2.3
1/8/2017 a 1.1 1
1/8/2017 b 2.2 2
I've tried creating a lambda function to try to use loc and timedelta to create this shifting, but I was only able to output empty numpy arrays:
def row_delta(x, df, days, colname):
if datetime.strptime(x['recorddate'], '%Y%m%d') - timedelta(days) in [datetime.strptime(x,'%Y%m%d') for x in df['recorddate'].unique().tolist()]:
return df.loc[(df['recorddate_date'] == df['recorddate_date'] - timedelta(days)) & (df['cusip'] == x['cusip']) ,colname]
else:
return 'nothing'
I also thought of doing something similar to this in order to fill in missing dates, but my issue is that I have multiple indexes, the dates and the cusips so I can't just reindex on this.
I'm not really sure what else I can do, but any help would be greatly appreciated!
回答1:
merge
the DataFrame
with itself while adding 7 days to the date column for the right Frame. Use the suffixes
argument to name the columns appropriately.
import pandas as pd
df['date'] = pd.to_datetime(df.date)
df.merge(df.assign(date = df.date+pd.Timedelta(days=7)),
on=['date', 'cusip'],
how='left', suffixes=['', '_7daysago'])
Output: df
date cusip price price_7daysago
0 2017-01-01 a 1.0 NaN
1 2017-01-01 b 2.0 NaN
2 2017-01-02 a 1.2 NaN
3 2017-01-02 b 2.3 NaN
4 2017-01-08 a 1.1 1.0
5 2017-01-08 b 2.2 2.0
回答2:
you can set date
and cusip
as index and use unstack
and shift
together
shifted = df.set_index(["date", "cusip"]).unstack().shift(7).stack()
then simply merge shifted
with your original df
来源:https://stackoverflow.com/questions/52435070/how-to-lag-data-by-x-specific-days-on-a-multi-index-pandas-dataframe