I have a pandas data frame with forex data by minutes, one year long (371635 rows):
O H L C
0
datetime
First you should avoid combining Python datetime
with Pandas operations. There are many Pandas / NumPy friendly methods to create datetime
objects for comparison, e.g. pd.Timestamp
and pd.to_datetime
. Your performance issues here are partly due to this behaviour described in the docs:
pd.Series.dt.date
returns an array of pythondatetime.date
objects
Using object
dtype in this way removes vectorisation benefits, as operations then require Python-level loops.
groupby
operations for aggregating by datePandas already has functionality to group by date via normalizing time:
for day, df_day in df.groupby(df.index.floor('d')):
df_day_t = df_day.between_time('08:30', '09:30')
# do something
As another example, you can access a slice for a particular day in this way:
g = df.groupby(df.index.floor('d'))
my_day = pd.Timestamp('2017-01-01')
df_slice = g.get_group(my_day)