问题
I am trying to set different timezones for various rows in a Pandas dataframe based on a criterion. As a MWE, here is what I have tried:
test = pd.DataFrame( data = pd.to_datetime(['2015-03-30 20:12:32','2015-03-12 00:11:11']) ,columns=['time'] )
test['new_col']=['new','old']
test.time=test.set_index('time').index.tz_localize('UTC')
test.loc[test.new_col=='new','time']=test[test.new_col=='new'].set_index('time').index.tz_convert('US/Pacific')
print test
The output of this:
time new_col
0 1427746352000000000 new
1 2015-03-12 00:11:11+00:00 old
As you can see, the row with the updated timezone is converted to an integer. How can I do this properly so that the updated entry is a datetime?
回答1:
Using 0.17.0rc2 (0.17.0 is release on Oct 9), you can do this.
In [43]: test['new_col2'] = [Timestamp('2015-03-30 20:12:32',tz='US/Eastern'),Timestamp('2015-03-30 20:12:32',tz='US/Pacific')]
In [44]: test
Out[44]:
time new_col new_col2
0 2015-03-30 20:12:32+00:00 new 2015-03-30 20:12:32-04:00
1 2015-03-12 00:11:11+00:00 old 2015-03-30 20:12:32-07:00
In [45]: test.dtypes
Out[45]:
time datetime64[ns, UTC]
new_col object
new_col2 object
dtype: object
Note that mixed timezones within a column with force object
dtype. So it can be done, but is generally not recommended. You would need to change entries individually.
You almost always a single dtyped column of a single timezone.
回答2:
Here's a solution that works once you add a column that specifies the timezone to convert to.
utc_df = pd.DataFrame({"timestamp": [pd.Timestamp("2019-09-01 12:00:00+0000", tz="UTC"),
pd.Timestamp("2019-11-01 12:00:00+0000", tz="UTC")],
"timezone": ["Europe/Brussels", "Europe/London"]})
This sample still has the time in UTC and looks like:
timestamp timezone
0 2019-09-01 12:00:00+00:00 Europe/Brussels
1 2019-11-01 12:00:00+00:00 Europe/London
We then group by timezone and apply the conversion.
def localize_time(df):
def convert_tz(tz_df):
return tz_df.set_index('timestamp').tz_convert(tz_df.timezone.values[0]).reset_index()
return df.groupby('timezone').apply(convert_tz).reset_index(drop=True)
localize_time(utc_df)
Which returns:
timestamp timezone
0 2019-09-01 14:00:00+02:00 Europe/Brussels
1 2019-11-01 12:00:00+00:00 Europe/London
Do note the dtype
of timestamp column will change to object
.
utc_df.dtypes
timestamp datetime64[ns, UTC]
timezone object
localize_time(utc_df).dtypes
timestamp object
timezone object
However, you can still access the datetime functionality of this column as long as you keep grouping by timezone and then apply the function you want to (like in the example shown here).
来源:https://stackoverflow.com/questions/32984303/how-to-apply-tz-convert-with-different-timezones-to-different-rows-in-pandas-dat