pandas re-indexing with missing dates

跟風遠走 提交于 2019-12-12 03:37:18

问题


from dateutil.rrule import rrule, MONTHLY

def fread_year_month(strt_dt, end_dt):
        dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
        return dates

df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )

df2 = pd.DataFrame({
'value' : [1,1,1,1,1],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 5, 1)),
'stock': ['msft']*5
},columns=[
'value', 'date', 'stock'] )

df = df.append(df2)

df.set_index(['stock', 'date'], inplace=True)

I have the above pandas dataframe. As you can see the number of available data for amzn is not the same as msft. In this example the dates are sequential but it need not be the case (the dates can be any date).

If the universe of available dates are the dates for which I have data for AMZN then how can I add those exact dates for every other stock in my data frame with a NaN or NA.

In the example give, I want to insert the missing dates for msft in the index and insert NaN or NA for the value for those date indices.


回答1:


If you want to work with your tickers as columns, could do an unstack, like this:

In [71]: df.unstack(level=0)
Out[71]: 
           value     
stock       amzn msft
date                 
2015-01-01   4.0  1.0
2015-02-01   2.0  1.0
2015-03-01   5.0  1.0
2015-04-01   6.0  1.0
2015-05-01   7.0  1.0
2015-06-01   8.0  NaN
2015-07-01   6.0  NaN
2015-08-01   5.0  NaN
2015-09-01   4.0  NaN
2015-10-01   1.0  NaN
2015-11-01   2.0  NaN
2015-12-01   4.0  NaN

To reindex into the same shape, the from_product below creates a new MultiIndex with all the combinations of dates / tickers.

In [75]: df.reindex(pd.MultiIndex.from_product(df.index.levels))
Out[75]: 
                 value
amzn 2015-01-01    4.0
     2015-02-01    2.0
     2015-03-01    5.0
     2015-04-01    6.0
     2015-05-01    7.0
     2015-06-01    8.0
     2015-07-01    6.0
     2015-08-01    5.0
     2015-09-01    4.0
     2015-10-01    1.0
     2015-11-01    2.0
     2015-12-01    4.0
msft 2015-01-01    1.0
     2015-02-01    1.0
     2015-03-01    1.0
     2015-04-01    1.0
     2015-05-01    1.0
     2015-06-01    NaN
     2015-07-01    NaN
     2015-08-01    NaN
     2015-09-01    NaN
     2015-10-01    NaN
     2015-11-01    NaN
     2015-12-01    NaN



回答2:


Before your merge, you can do this (as of Pandas 0.17):

>>> df2.set_index('date').reindex(df.date)
            value stock
date                   
2015-01-01      1  msft
2015-02-01      1  msft
2015-03-01      1  msft
2015-04-01      1  msft
2015-05-01      1  msft
2015-06-01    NaN   NaN
2015-07-01    NaN   NaN
2015-08-01    NaN   NaN
2015-09-01    NaN   NaN
2015-10-01    NaN   NaN
2015-11-01    NaN   NaN
2015-12-01    NaN   NaN


来源:https://stackoverflow.com/questions/36204249/pandas-re-indexing-with-missing-dates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!