I\'m trying to read a file in with dates in the (UK) format 13/01/1800, however some of the dates are before 1667, which cannot be represented by the nanosecond timestamp (see h
you can try to do it this way:
fn = r'D:\temp\.data\36987699.csv'
def dt_parse(s):
d,m,y = s.split('/')
return pd.Period(year=int(y), month=int(m), day=int(d), freq='D')
df = pd.read_csv(fn, parse_dates=[0], date_parser=dt_parse)
Input file:
Date,col1
13/01/1800,aaa
25/12/1001,bbb
01/03/1267,ccc
Test:
In [16]: df
Out[16]:
Date col1
0 1800-01-13 aaa
1 1001-12-25 bbb
2 1267-03-01 ccc
In [17]: df.dtypes
Out[17]:
Date object
col1 object
dtype: object
In [18]: df['Date'].dt.year
Out[18]:
0 1800
1 1001
2 1267
Name: Date, dtype: int64
PS you may want to add try ... catch
block in the dt_parse()
function for catching ValueError:
exceptions - result of int()
...