Reading a csv with a timestamp column, with pandas

前端 未结 3 858
花落未央
花落未央 2020-12-05 09:53

When doing:

import pandas
x = pandas.read_csv(\'data.csv\', parse_dates=True, index_col=\'DateTime\', 
                                names=[\'DateTime\', \         


        
相关标签:
3条回答
  • 2020-12-05 10:42

    Use to_datetime and pass unit='s' to parse the units as unix timestamps, this will be much faster:

    In [7]:
    pd.to_datetime(df.index, unit='s')
    
    Out[7]:
    DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',
                   '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',
                   '2015-12-02 11:02:19.250000'],
                  dtype='datetime64[ns]', name=0, freq=None)
    

    Timings:

    In [9]:
    
    import time
    %%timeit
    import time
    def date_parser(string_list):
        return [time.ctime(float(x)) for x in string_list]
    ​
    df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';', 
                     date_parser=date_parser, 
                     index_col='DateTime', 
                     names=['DateTime', 'X'], header=None)
    100 loops, best of 3: 4.07 ms per loop
    

    and

    In [12]:
    %%timeit
    t="""1449054136.83;15.31
    1449054137.43;16.19
    1449054138.04;19.22
    1449054138.65;15.12
    1449054139.25;13.12"""
    df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0])
    df.index = pd.to_datetime(df.index, unit='s')
    100 loops, best of 3: 1.69 ms per loop
    

    So using to_datetime is over 2x faster on this small dataset, I expect this to scale much better than the other methods

    0 讨论(0)
  • 2020-12-05 10:51

    You can parse the date yourself:

    import time
    import pandas as pd
    
    def date_parser(string_list):
        return [time.ctime(float(x)) for x in string_list]
    
    df = pd.read_csv('data.csv', parse_dates=[0],  sep=';', 
                     date_parser=date_parser, 
                     index_col='DateTime', 
                     names=['DateTime', 'X'], header=None)
    

    The result:

    >>> df
                            X
    DateTime                  
    2015-12-02 12:02:16  15.31
    2015-12-02 12:02:17  16.19
    2015-12-02 12:02:18  19.22
    2015-12-02 12:02:18  15.12
    2015-12-02 12:02:19  13.12
    
    0 讨论(0)
  • 2020-12-05 10:52

    My solution was similar to Mike's:

    import pandas
    import datetime
    def dateparse (time_in_secs):    
        return datetime.datetime.fromtimestamp(float(time_in_secs))
    
    x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None)
    
    out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))
    
    0 讨论(0)
提交回复
热议问题