faster csv loading with datetime index pandas

后端 未结 1 1640
孤城傲影
孤城傲影 2021-01-17 03:26

I am often iterating of financial price data stored in csv file. Like the accessibility of using pandas datetime objects to subset and organize data when all of my analysis

相关标签:
1条回答
  • 2021-01-17 04:06

    after testing few options for loading & parsing a csv file with, 13,811,418 rows having, 98 unique date values, we arrived at the below snippet, and found out that if we pass the format param with predefined date-format ('%m/%d/%Y' in our case) we could reach 2.52 s with Pandas.0.15.3.

    def to_date(dates, lookup=False, **args):
        if lookup:
            return dates.map({v: pd.to_datetime(v, **args) for v in dates.unique()})
        return pd.to_datetime(dates, **args)
    
    • also use coerce=True (or coarse='raise' in later versions) for enabling date-format validation, other-wise the error values are retained as string-value, and will lead to an error when any other date-time operation is performed on the dataframe column
    0 讨论(0)
提交回复
热议问题