Creating pandas DatetimeIndex in Dataframe from DST aware datetime objects

微笑、不失礼 提交于 2021-01-28 06:28:28

问题


From an online API I gather a series of data points, each with a value and an ISO timestamp. Unfortunately I need to loop over them, so I store them in a temporary dict and then create a pandas dataframe from that and set the index to the timestamp column (simplified example):

from datetime import datetime
import pandas


input_data = [
    '2019-09-16T06:44:01+02:00',
    '2019-11-11T09:13:01+01:00',
]

data = []
for timestamp in input_data:
    _date = datetime.fromisoformat(timestamp)

    data.append({'time': _date})

pd_data = pandas.DataFrame(data).set_index('time')

As long as all timestamps are in the same timezone and DST/non-DST everything works fine, and, I get a Dataframe with a DatetimeIndex which I can work on later. However, once two different time-offsets appear in one dataset (above example), I only get an Index, in my dataframe, which does not support any time-based methods.

Is there any way to make pandas accept timezone-aware, differing date as index?


回答1:


  • A pandas datetime column also requires the offset to be the same. A column with different offsets, will not be converted to a datetime dtype.
  • I suggest, do not convert the data to a datetime until it's in pandas.
  • Separate the time offset, and treat it as a timedelta
  • to_timedelta requires a format of 'hh:mm:ss' so add ':00' to the end of the offset
  • See Pandas: Time deltas for all the available timedelta operations
  • pandas.Series.dt.tz_convert
  • pandas.Series.tz_localize
  • Convert to a specific TZ with:
    • If a datetime is not datetime64[ns, UTC] dtype, then first use .dt.tz_localize('UTC') before .dt.tz_convert('US/Pacific')
    • Otherwise df.datetime_utc.dt.tz_convert('US/Pacific')
import pandas as pd

# sample data
input_data = ['2019-09-16T06:44:01+02:00', '2019-11-11T09:13:01+01:00']

# dataframe
df = pd.DataFrame(input_data, columns=['datetime'])

# separate the offset from the datetime and convert it to a timedelta
df['offset'] = pd.to_timedelta(df.datetime.str[-6:] + ':00')

# if desired, create a str with the separated datetime
# converting this to a datetime will lead to AmbiguousTimeError because of overlapping datetimes at 2AM, per the OP
df['datetime_str'] = df.datetime.str[:-6]

# convert the datetime column to a datetime format without the offset
df['datetime_utc'] = pd.to_datetime(df.datetime, utc=True)

# display(df)
                    datetime          offset        datetime_str              datetime_utc
0  2019-09-16T06:44:01+02:00 0 days 02:00:00 2019-09-16 06:44:01 2019-09-16 04:44:01+00:00
1  2019-11-11T09:13:01+01:00 0 days 01:00:00 2019-11-11 09:13:01 2019-11-11 08:13:01+00:00

print(df.info())
[out]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype              
---  ------        --------------  -----              
 0   datetime      2 non-null      object             
 1   offset        2 non-null      timedelta64[ns]    
 2   datetime_str  2 non-null      object             
 3   datetime_utc  2 non-null      datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), object(2), timedelta64[ns](1)
memory usage: 192.0+ bytes

# convert to local timezone
df.datetime_utc.dt.tz_convert('US/Pacific')

[out]:
0   2019-09-15 21:44:01-07:00
1   2019-11-11 00:13:01-08:00
Name: datetime_utc, dtype: datetime64[ns, US/Pacific]

Other Resources

  • Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes.
  • Talk Python to Me: Episode #271: Unlock the mysteries of time, Python's datetime that is!
  • Real Python: Using Python datetime to Work With Dates and Times
  • The dateutil module provides powerful extensions to the standard datetime module.



回答2:


A minor correction of the question's wording (which I think is important). What you have are UTC offsets - DST/no-DST would require more information than that, i.e. a time zone. Here, this matters since you can parse timestamps with UTC offsets (even different ones) to UTC easily:

import pandas as pd

input_data = [
    '2019-09-16T06:44:01+02:00',
    '2019-11-11T09:13:01+01:00',
]

dti = pd.to_datetime(input_data, utc=True)
# dti
# DatetimeIndex(['2019-09-16 04:44:01+00:00', '2019-11-11 08:13:01+00:00'], dtype='datetime64[ns, UTC]', freq=None)

I always prefer to work with UTC so I'd be fine with that. If however you really need datetime in a certain time zone, you can convert e.g. as

dti = dti.tz_convert('Europe/Berlin')
# dti
# DatetimeIndex(['2019-09-16 06:44:01+02:00', '2019-11-11 09:13:01+01:00'], dtype='datetime64[ns, Europe/Berlin]', freq=None)



回答3:


I'm unaware of a way to use timezone aware datetimes as the index and get a datetime index in pandas. I do have a suggestion that might help depending on what is required out of your data though.

Would it be acceptable to convert the datetime objects to the same timezone, or is the timezone information something that must be retained? If you do require the timezone but not necessarily with the index, While looping through the dates you can store a new column with the old timezone or have a duplicate of the original time from the timezone in a new column so it can still be accessed.



来源:https://stackoverflow.com/questions/63495502/creating-pandas-datetimeindex-in-dataframe-from-dst-aware-datetime-objects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!