Conversion of Daily pandas dataframe to minute frequency does not work for 2 row dataframe

余生长醉 提交于 2019-12-11 06:14:19

问题


I am trying to convert a daily frequency dataframe to minute data, and in a previous post it was suggested to use the ffil method below but it does not seem to work with dataframes that consist of only 2 rows (Conversion of Daily pandas dataframe to minute frequency).

So the below dataframe is supposed to be converted.

import pandas as pd
dict = [
        {'ticker':'jpm','date': '2016-11-28','returns': 0.2},
{ 'ticker':'ge','date': '2016-11-28','returns': 0.2},
{'ticker':'fb', 'date': '2016-11-28','returns': 0.2},
{'ticker':'aapl', 'date': '2016-11-28','returns': 0.2},
{'ticker':'msft','date': '2016-11-28','returns': 0.2},
{'ticker':'amzn','date': '2016-11-28','returns': 0.2},
{'ticker':'jpm','date': '2016-11-29','returns': 0.2},
{'ticker':'ge', 'date': '2016-11-29','returns': 0.2},
{'ticker':'fb','date': '2016-11-29','returns': 0.2},
{'ticker':'aapl','date': '2016-11-29','returns': 0.2},
{'ticker':'msft','date': '2016-11-29','returns': 0.2},
{'ticker':'amzn','date': '2016-11-29','returns': 0.2}
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)  

This works on the entire dataframe:

df_min = df.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()

But when I work with a smaller dataframe it returns an empty dataframe for some reason:

df2=df.iloc[0:2,:]

df2_min = df2.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()

Does anyone have an explanation for this odd behaviour?

edt: I noticed the code only works if the dataframe has at least 7 rows.


回答1:


If you have only 2 row input DataFrame then after reshape by unstack get one row DataFrame and pandas cannot create continous minute DataFrame, because only one value of DatetimeIndex.

Possible solution is add next day after reshape, fill it last previous row data, apply solution and in last steps remove last helper row by positions with iloc:

df2=df.iloc[0:2]
print (df2)
                   returns
date       ticker         
2016-11-28 jpm         0.2
           ge          0.2

df3 = df2.unstack()
print (df3)
ticker         jpm   ge
date                   
2016-11-28     0.2  0.2
df3.loc[df3.index.max() + pd.Timedelta(1, unit='d')] = df3.iloc[-1]
print (df3)
           returns     
ticker         jpm   ge
date                   
2016-11-28     0.2  0.2
2016-11-29     0.2  0.2 <- helper row

df_min = df3.asfreq('Min', method='ffill')
print (df_min.tail())
                    returns     
ticker                  jpm   ge
date                            
2016-11-28 23:56:00     0.2  0.2
2016-11-28 23:57:00     0.2  0.2
2016-11-28 23:58:00     0.2  0.2
2016-11-28 23:59:00     0.2  0.2
2016-11-29 00:00:00     0.2  0.2 <- helper row

df_min = df_min.iloc[:-1].between_time('8:30','16:00').stack()
#print (df_min)


来源:https://stackoverflow.com/questions/56895049/conversion-of-daily-pandas-dataframe-to-minute-frequency-does-not-work-for-2-row

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!