I\'m new to both StackOverflow and pandas. I am trying to read in a large CSV file with stock market bin data in the following format:
date,time,open,high,low,cl
There are quite a few ways to do this. One way to do it during read_csv
would be to use the parse_dates
and date_parser
arguments, telling parse_dates
to combine the date and time columns and defining an inline function to parse the dates:
>>> df = pd.read_csv("bindat.csv", parse_dates=[["date", "time"]],
date_parser=lambda x: pd.to_datetime(x, format="%Y%m%d %H%M"),
index_col="date_time")
>>> df
open high low close volume splits earnings dividends sym
date_time
2013-06-25 07:15:00 49.2634 49.2634 49.2634 49.2634 156.293 1 0 0 JPM
2013-06-25 07:30:00 49.2730 49.2730 49.2730 49.2730 208.390 1 0 0 JPM
2013-06-25 07:40:00 49.1866 49.1866 49.1866 49.1866 224.019 1 0 0 JPM
2013-06-25 07:45:00 49.3210 49.3210 49.3210 49.3210 208.390 1 0 0 JPM
2013-06-25 07:50:00 49.3306 49.3690 49.3306 49.3690 4583.540 1 0 0 JPM
2013-06-25 07:55:00 49.3690 49.3690 49.3690 49.3690 416.780 1 0 0 JPM
2013-06-25 08:00:00 49.3690 49.3690 49.3594 49.3594 1715.050 1 0 0 JPM
2013-06-25 08:05:00 49.3690 49.3690 49.3306 49.3306 1333.700 1 0 0 JPM
2013-06-25 08:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM
2013-06-25 16:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM
where I added an extra row at the end to make sure that hours were behaving.