How to read timezone aware datetimes as a timezone naive local DatetimeIndex with read_csv in pandas?

前端 未结 4 1845
小蘑菇
小蘑菇 2021-01-03 01:21

When I use pandas read_csv to read a column with a timezone aware datetime (and specify this column to be the index), pandas converts it to a timezone naive utc

相关标签:
4条回答
  • 2021-01-03 01:46

    The answer of Alex leads to a timezone aware DatetimeIndex. To get a timezone naive local DatetimeIndex, as asked by the OP, inform dateutil.parser.parser to ignore the timezone information by setting ignoretz=True:

    import dateutil
    
    date_parser = lambda x: dateutil.parser.parse(x, ignoretz=True)
    df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)
    
    print(df)
    

    outputs

                         Temperature
    DateTime                        
    2016-07-01 11:05:07       21.125
    2016-07-01 11:05:09       21.138
    2016-07-01 11:05:10       21.156
    2016-07-01 11:05:11       21.179
    2016-07-01 11:05:12       21.198
    2016-07-01 11:05:13       21.206
    2016-07-01 11:05:14       21.225
    2016-07-01 11:05:15       21.233
    
    0 讨论(0)
  • 2021-01-03 01:51

    According to the docs the default date_parser uses dateutil.parser.parser. According to the docs for that function, the default is to ignore timezones. So if you supply dateutil.parser.parser as the date_parser kwarg, timezones are not converted.

    import dateutil
    
    df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=dateutil.parser.parse)
    
    print(df)
    

    outputs

                               Temperature
    DateTime                              
    2016-07-01 11:05:07+02:00       21.125
    2016-07-01 11:05:09+02:00       21.138
    2016-07-01 11:05:10+02:00       21.156
    2016-07-01 11:05:11+02:00       21.179
    2016-07-01 11:05:12+02:00       21.198
    2016-07-01 11:05:13+02:00       21.206
    2016-07-01 11:05:14+02:00       21.225
    2016-07-01 11:05:15+02:00       21.233
    
    0 讨论(0)
  • 2021-01-03 01:57

    I adopted the dateutil technique earlier today but have since switched to a faster alternative:

    date_parser = lambda ts: pd.to_datetime([s[:-5] for s in ts]))
    

    Edit: s[:-5] is correct (screenshot has error)

    In the screenshot below, I import ~55MB of tab-separated files. The dateutil method works, but takes orders of magnitude longer.

    This was using pandas 0.18.1 and dateutil 2.5.3.


    EDIT This lambda function will work even if Z-0000 suffix is missing...

    date_parser = lambda ts: pd.to_datetime([s[:-5] if 'Z' in s else s for s in ts])
    
    0 讨论(0)
  • 2021-01-03 01:57

    You can even try :

    date_parser = lambda x : pd.to_datetime(x.str[:-6])
    
    0 讨论(0)
提交回复
热议问题