How to apply tz_convert with different timezones to different rows in pandas dataframe

拈花ヽ惹草 提交于 2019-12-22 10:56:25

问题


I am trying to set different timezones for various rows in a Pandas dataframe based on a criterion. As a MWE, here is what I have tried:

test = pd.DataFrame( data = pd.to_datetime(['2015-03-30 20:12:32','2015-03-12 00:11:11']) ,columns=['time'] )
test['new_col']=['new','old']
test.time=test.set_index('time').index.tz_localize('UTC')
test.loc[test.new_col=='new','time']=test[test.new_col=='new'].set_index('time').index.tz_convert('US/Pacific')
print test

The output of this:

                        time new_col
0        1427746352000000000     new
1  2015-03-12 00:11:11+00:00     old

As you can see, the row with the updated timezone is converted to an integer. How can I do this properly so that the updated entry is a datetime?


回答1:


Using 0.17.0rc2 (0.17.0 is release on Oct 9), you can do this.

In [43]: test['new_col2'] = [Timestamp('2015-03-30 20:12:32',tz='US/Eastern'),Timestamp('2015-03-30 20:12:32',tz='US/Pacific')]

In [44]: test
Out[44]: 
                       time new_col                   new_col2
0 2015-03-30 20:12:32+00:00     new  2015-03-30 20:12:32-04:00
1 2015-03-12 00:11:11+00:00     old  2015-03-30 20:12:32-07:00

In [45]: test.dtypes
Out[45]: 
time        datetime64[ns, UTC]
new_col                  object
new_col2                 object
dtype: object

Note that mixed timezones within a column with force object dtype. So it can be done, but is generally not recommended. You would need to change entries individually.

You almost always a single dtyped column of a single timezone.




回答2:


Here's a solution that works once you add a column that specifies the timezone to convert to.

utc_df = pd.DataFrame({"timestamp": [pd.Timestamp("2019-09-01 12:00:00+0000", tz="UTC"),
                                     pd.Timestamp("2019-11-01 12:00:00+0000", tz="UTC")],
                        "timezone": ["Europe/Brussels", "Europe/London"]})

This sample still has the time in UTC and looks like:

                  timestamp         timezone 
0 2019-09-01 12:00:00+00:00  Europe/Brussels 
1 2019-11-01 12:00:00+00:00    Europe/London

We then group by timezone and apply the conversion.

def localize_time(df):
    def convert_tz(tz_df):
        return tz_df.set_index('timestamp').tz_convert(tz_df.timezone.values[0]).reset_index()

    return df.groupby('timezone').apply(convert_tz).reset_index(drop=True)

localize_time(utc_df)

Which returns:

                   timestamp         timezone
0  2019-09-01 14:00:00+02:00  Europe/Brussels
1  2019-11-01 12:00:00+00:00    Europe/London

Do note the dtype of timestamp column will change to object.

utc_df.dtypes
timestamp    datetime64[ns, UTC]
timezone                  object

localize_time(utc_df).dtypes
timestamp    object
timezone     object

However, you can still access the datetime functionality of this column as long as you keep grouping by timezone and then apply the function you want to (like in the example shown here).



来源:https://stackoverflow.com/questions/32984303/how-to-apply-tz-convert-with-different-timezones-to-different-rows-in-pandas-dat

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!