问题
from dateutil.rrule import rrule, MONTHLY
def fread_year_month(strt_dt, end_dt):
dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
return dates
df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )
df2 = pd.DataFrame({
'value' : [1,1,1,1,1],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 5, 1)),
'stock': ['msft']*5
},columns=[
'value', 'date', 'stock'] )
df = df.append(df2)
df.set_index(['stock', 'date'], inplace=True)
I have the above pandas dataframe. As you can see the number of available data for amzn is not the same as msft. In this example the dates are sequential but it need not be the case (the dates can be any date).
If the universe of available dates are the dates for which I have data for AMZN then how can I add those exact dates for every other stock in my data frame with a NaN or NA.
In the example give, I want to insert the missing dates for msft in the index and insert NaN or NA for the value for those date indices.
回答1:
If you want to work with your tickers as columns, could do an unstack
, like this:
In [71]: df.unstack(level=0)
Out[71]:
value
stock amzn msft
date
2015-01-01 4.0 1.0
2015-02-01 2.0 1.0
2015-03-01 5.0 1.0
2015-04-01 6.0 1.0
2015-05-01 7.0 1.0
2015-06-01 8.0 NaN
2015-07-01 6.0 NaN
2015-08-01 5.0 NaN
2015-09-01 4.0 NaN
2015-10-01 1.0 NaN
2015-11-01 2.0 NaN
2015-12-01 4.0 NaN
To reindex into the same shape, the from_product
below creates a new MultiIndex
with all the combinations of dates / tickers.
In [75]: df.reindex(pd.MultiIndex.from_product(df.index.levels))
Out[75]:
value
amzn 2015-01-01 4.0
2015-02-01 2.0
2015-03-01 5.0
2015-04-01 6.0
2015-05-01 7.0
2015-06-01 8.0
2015-07-01 6.0
2015-08-01 5.0
2015-09-01 4.0
2015-10-01 1.0
2015-11-01 2.0
2015-12-01 4.0
msft 2015-01-01 1.0
2015-02-01 1.0
2015-03-01 1.0
2015-04-01 1.0
2015-05-01 1.0
2015-06-01 NaN
2015-07-01 NaN
2015-08-01 NaN
2015-09-01 NaN
2015-10-01 NaN
2015-11-01 NaN
2015-12-01 NaN
回答2:
Before your merge, you can do this (as of Pandas 0.17):
>>> df2.set_index('date').reindex(df.date)
value stock
date
2015-01-01 1 msft
2015-02-01 1 msft
2015-03-01 1 msft
2015-04-01 1 msft
2015-05-01 1 msft
2015-06-01 NaN NaN
2015-07-01 NaN NaN
2015-08-01 NaN NaN
2015-09-01 NaN NaN
2015-10-01 NaN NaN
2015-11-01 NaN NaN
2015-12-01 NaN NaN
来源:https://stackoverflow.com/questions/36204249/pandas-re-indexing-with-missing-dates