问题
I have a pandas df 'instr_bar' with tick data as follows:
time
2016-07-29 16:07:24 5.72
2016-07-29 16:07:24 5.72
2016-07-29 16:07:24 5.72
2016-07-29 16:07:58 5.72
2016-07-29 16:07:58 5.72
2016-07-29 16:09:49 5.70
2016-07-29 16:09:50 5.73
2016-07-29 16:11:14 5.73
2016-07-29 16:11:14 5.73
2016-07-29 16:14:53 5.77
2016-07-29 16:14:53 5.77
2016-07-29 16:17:27 5.75
2016-07-29 16:17:43 5.76
2016-07-29 16:17:43 5.76
I want to turn this into 5 minute OHLC. The index is not unique in many instances.
I then use the following code : instr_bar = instr_bar.resample('5Min').ohlc()
I then get the following df:
open high low close
time
2016-07-29 15:40:00 5.74 5.74 5.74 5.74
2016-07-29 15:45:00 NaN NaN NaN NaN
2016-07-29 15:50:00 5.75 5.75 5.75 5.75
2016-07-29 15:55:00 5.75 5.75 5.72 5.72
2016-07-29 16:00:00 5.72 5.72 5.72 5.72
2016-07-29 16:05:00 5.72 5.73 5.70 5.73
2016-07-29 16:10:00 5.73 5.77 5.73 5.77
2016-07-29 16:15:00 5.75 5.76 5.72 5.72
2016-07-29 16:20:00 NaN NaN NaN NaN
2016-07-29 16:25:00 5.72 5.72 5.72 5.72
Q1: How do I backfill the NaNs with last observed values?
Q2: I now also got NaNs outside the trading/opening ours (09:00 - 16:30), how do I get rid of them?
回答1:
try bfill():
instr_bar = instr_bar.resample('5T').ohlc().bfill()
or ffill():
instr_bar = instr_bar.resample('5T').ohlc().ffill()
depending on what do you want to achieve
if you want to filter rows by time you can use between_time() method:
instr_bar.between_time('09:00', '16:30')
altogether:
instr_bar = instr_bar.resample('5T').ohlc().ffill().between_time('09:00', '16:30')
来源:https://stackoverflow.com/questions/38672880/how-to-resample-pandas-df-tick-data-to-5-min-ohlc-data