Reading CSV file in Pandas with historical dates

前端 未结 1 1579
天命终不由人
天命终不由人 2021-01-25 10:55

I\'m trying to read a file in with dates in the (UK) format 13/01/1800, however some of the dates are before 1667, which cannot be represented by the nanosecond timestamp (see h

1条回答
  •  说谎
    说谎 (楼主)
    2021-01-25 11:07

    you can try to do it this way:

    fn = r'D:\temp\.data\36987699.csv'
    
    def dt_parse(s):
        d,m,y = s.split('/')
        return pd.Period(year=int(y), month=int(m), day=int(d), freq='D')
    
    
    df = pd.read_csv(fn, parse_dates=[0], date_parser=dt_parse)
    

    Input file:

    Date,col1
    13/01/1800,aaa
    25/12/1001,bbb
    01/03/1267,ccc
    

    Test:

    In [16]: df
    Out[16]:
            Date col1
    0 1800-01-13  aaa
    1 1001-12-25  bbb
    2 1267-03-01  ccc
    
    In [17]: df.dtypes
    Out[17]:
    Date    object
    col1    object
    dtype: object
    
    In [18]: df['Date'].dt.year
    Out[18]:
    0    1800
    1    1001
    2    1267
    Name: Date, dtype: int64
    

    PS you may want to add try ... catch block in the dt_parse() function for catching ValueError: exceptions - result of int()...

    0 讨论(0)
提交回复
热议问题