问题
I have a pandas dataframe with UNIX timestamps (these are integers and not time objects). The observations occur in multiple geographic locations, and therefore multiple timezones. I'd like to convert the UNIX timestamp into local time (in a new column) for each of these timezones, based on the geography of the observation (this information is in a column of the dataframe).
Simple working example:
Creating the dataframe:
c1=[1546555701, 1546378818, 1546574677, 1546399159, 1546572278]
c2=['America/Detroit','America/Chicago','America/Los_Angeles','America/Los_Angeles','America/Detroit']
df3=pd.DataFrame(list(zip(c1,c2)),columns=['utc','tz'])
print(df3)
Expected output:
utc tz
0 1546555701 America/Detroit
1 1546378818 America/Chicago
2 1546574677 America/Los_Angeles
3 1546399159 America/Los_Angeles
4 1546572278 America/Detroit
Current attempt:
df3['date_time']=pd.to_datetime(df3['utc'],unit='s')
print(df3)
Returns:
utc tz date_time
0 1546555701 America/Detroit 2019-01-03 22:48:21
1 1546378818 America/Chicago 2019-01-01 21:40:18
2 1546574677 America/Los_Angeles 2019-01-04 04:04:37
3 1546399159 America/Los_Angeles 2019-01-02 03:19:19
4 1546572278 America/Detroit 2019-01-04 03:24:38
This converts to a datetime object, but I am unsure how to control the timezone (I presume it gives me the time in my local timezone). It is certainly not based off the 'tz' column.
I have looked at pandas' tz_convert() function and the arrow package, but have not been able to figure out how to make these work. I am open to other solutions as well. I am concerned not only with timezone, but also making sure that daylight savings time is properly handled.
回答1:
Assuming POSIX timestamps (seconds since 1970-01-01 UTC), you can directly convert to UTC with keyword utc=True.
import pandas as pd
c1=[1546555701, 1546378818, 1546574677, 1546399159, 1546572278]
c2=['America/Detroit','America/Chicago','America/Los_Angeles','America/Los_Angeles','America/Detroit']
df3=pd.DataFrame(list(zip(c1,c2)),columns=['utc','tz'])
df3['date_time']=pd.to_datetime(df3['utc'], unit='s', utc=True)
# df3['date_time']
# 0 2019-01-03 22:48:21+00:00
# 1 2019-01-01 21:40:18+00:00
# 2 2019-01-04 04:04:37+00:00
# 3 2019-01-02 03:19:19+00:00
# 4 2019-01-04 03:24:38+00:00
# Name: date_time, dtype: datetime64[ns, UTC]
You can then apply a time zone to each value using apply, e.g.
def setTZ(row):
return row['date_time'].tz_convert(row['tz'])
df3['date_time']=df3.apply(lambda r: setTZ(r), axis=1)
# df3
# utc tz date_time
# 0 1546555701 America/Detroit 2019-01-03 17:48:21-05:00
# 1 1546378818 America/Chicago 2019-01-01 15:40:18-06:00
# 2 1546574677 America/Los_Angeles 2019-01-03 20:04:37-08:00
# 3 1546399159 America/Los_Angeles 2019-01-01 19:19:19-08:00
# 4 1546572278 America/Detroit 2019-01-03 22:24:38-05:00
Note that with mixed time zones, you can't use the dt
accessor for the Series. You need iterative code instead, e.g.
df3['date_time'].apply(lambda t: t.hour)
to get the hour for each datetime. A way around this would be to create a column that has local time but is not time zone aware:
def toLocalTime(row):
return row['date_time'].tz_convert(row['tz']).replace(tzinfo=None)
df3['local_time'] = df3.apply(lambda r: toLocalTime(r), axis=1)
来源:https://stackoverflow.com/questions/64771881/pandas-convert-unix-time-to-multiple-different-timezones-depending-on-column-val