问题
I am trying to convert a daily frequency dataframe to minute data, and in a previous post it was suggested to use the ffil method below but it does not seem to work with dataframes that consist of only 2 rows (Conversion of Daily pandas dataframe to minute frequency).
So the below dataframe is supposed to be converted.
import pandas as pd
dict = [
{'ticker':'jpm','date': '2016-11-28','returns': 0.2},
{ 'ticker':'ge','date': '2016-11-28','returns': 0.2},
{'ticker':'fb', 'date': '2016-11-28','returns': 0.2},
{'ticker':'aapl', 'date': '2016-11-28','returns': 0.2},
{'ticker':'msft','date': '2016-11-28','returns': 0.2},
{'ticker':'amzn','date': '2016-11-28','returns': 0.2},
{'ticker':'jpm','date': '2016-11-29','returns': 0.2},
{'ticker':'ge', 'date': '2016-11-29','returns': 0.2},
{'ticker':'fb','date': '2016-11-29','returns': 0.2},
{'ticker':'aapl','date': '2016-11-29','returns': 0.2},
{'ticker':'msft','date': '2016-11-29','returns': 0.2},
{'ticker':'amzn','date': '2016-11-29','returns': 0.2}
]
df = pd.DataFrame(dict)
df['date'] = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)
This works on the entire dataframe:
df_min = df.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()
But when I work with a smaller dataframe it returns an empty dataframe for some reason:
df2=df.iloc[0:2,:]
df2_min = df2.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()
Does anyone have an explanation for this odd behaviour?
edt: I noticed the code only works if the dataframe has at least 7 rows.
回答1:
If you have only 2 row input DataFrame then after reshape by unstack
get one row DataFrame
and pandas cannot create continous minute DataFrame
, because only one value of DatetimeIndex
.
Possible solution is add next day after reshape, fill it last previous row data, apply solution and in last steps remove last helper row by positions with iloc
:
df2=df.iloc[0:2]
print (df2)
returns
date ticker
2016-11-28 jpm 0.2
ge 0.2
df3 = df2.unstack()
print (df3)
ticker jpm ge
date
2016-11-28 0.2 0.2
df3.loc[df3.index.max() + pd.Timedelta(1, unit='d')] = df3.iloc[-1]
print (df3)
returns
ticker jpm ge
date
2016-11-28 0.2 0.2
2016-11-29 0.2 0.2 <- helper row
df_min = df3.asfreq('Min', method='ffill')
print (df_min.tail())
returns
ticker jpm ge
date
2016-11-28 23:56:00 0.2 0.2
2016-11-28 23:57:00 0.2 0.2
2016-11-28 23:58:00 0.2 0.2
2016-11-28 23:59:00 0.2 0.2
2016-11-29 00:00:00 0.2 0.2 <- helper row
df_min = df_min.iloc[:-1].between_time('8:30','16:00').stack()
#print (df_min)
来源:https://stackoverflow.com/questions/56895049/conversion-of-daily-pandas-dataframe-to-minute-frequency-does-not-work-for-2-row