Pandas convert UNIX time to multiple different timezones depending on column value

问题

I have a pandas dataframe with UNIX timestamps (these are integers and not time objects). The observations occur in multiple geographic locations, and therefore multiple timezones. I'd like to convert the UNIX timestamp into local time (in a new column) for each of these timezones, based on the geography of the observation (this information is in a column of the dataframe).

Simple working example:

Creating the dataframe:

c1=[1546555701, 1546378818, 1546574677, 1546399159, 1546572278]
c2=['America/Detroit','America/Chicago','America/Los_Angeles','America/Los_Angeles','America/Detroit']

df3=pd.DataFrame(list(zip(c1,c2)),columns=['utc','tz'])

print(df3)

Expected output:

          utc                   tz
0  1546555701      America/Detroit
1  1546378818      America/Chicago
2  1546574677  America/Los_Angeles
3  1546399159  America/Los_Angeles
4  1546572278      America/Detroit

Current attempt:

df3['date_time']=pd.to_datetime(df3['utc'],unit='s')
print(df3)

Returns:

          utc                   tz           date_time
0  1546555701      America/Detroit 2019-01-03 22:48:21
1  1546378818      America/Chicago 2019-01-01 21:40:18
2  1546574677  America/Los_Angeles 2019-01-04 04:04:37
3  1546399159  America/Los_Angeles 2019-01-02 03:19:19
4  1546572278      America/Detroit 2019-01-04 03:24:38

This converts to a datetime object, but I am unsure how to control the timezone (I presume it gives me the time in my local timezone). It is certainly not based off the 'tz' column.

I have looked at pandas' tz_convert() function and the arrow package, but have not been able to figure out how to make these work. I am open to other solutions as well. I am concerned not only with timezone, but also making sure that daylight savings time is properly handled.

回答1:

Assuming POSIX timestamps (seconds since 1970-01-01 UTC), you can directly convert to UTC with keyword utc=True.

import pandas as pd

c1=[1546555701, 1546378818, 1546574677, 1546399159, 1546572278]
c2=['America/Detroit','America/Chicago','America/Los_Angeles','America/Los_Angeles','America/Detroit']

df3=pd.DataFrame(list(zip(c1,c2)),columns=['utc','tz'])
df3['date_time']=pd.to_datetime(df3['utc'], unit='s', utc=True)

# df3['date_time']
# 0   2019-01-03 22:48:21+00:00
# 1   2019-01-01 21:40:18+00:00
# 2   2019-01-04 04:04:37+00:00
# 3   2019-01-02 03:19:19+00:00
# 4   2019-01-04 03:24:38+00:00
# Name: date_time, dtype: datetime64[ns, UTC]

You can then apply a time zone to each value using apply, e.g.

def setTZ(row):
    return row['date_time'].tz_convert(row['tz'])

df3['date_time']=df3.apply(lambda r: setTZ(r), axis=1)

# df3
#           utc                   tz                  date_time
# 0  1546555701      America/Detroit  2019-01-03 17:48:21-05:00
# 1  1546378818      America/Chicago  2019-01-01 15:40:18-06:00
# 2  1546574677  America/Los_Angeles  2019-01-03 20:04:37-08:00
# 3  1546399159  America/Los_Angeles  2019-01-01 19:19:19-08:00
# 4  1546572278      America/Detroit  2019-01-03 22:24:38-05:00

Note that with mixed time zones, you can't use the dt accessor for the Series. You need iterative code instead, e.g.

df3['date_time'].apply(lambda t: t.hour)

to get the hour for each datetime. A way around this would be to create a column that has local time but is not time zone aware:

def toLocalTime(row):
    return row['date_time'].tz_convert(row['tz']).replace(tzinfo=None)

df3['local_time'] = df3.apply(lambda r: toLocalTime(r), axis=1)

来源：https://stackoverflow.com/questions/64771881/pandas-convert-unix-time-to-multiple-different-timezones-depending-on-column-val

标签

python

pandas

datetime

timezone

dst