Timezone conversion with pyspark from timestamp and country

爱⌒轻易说出口 提交于 2021-01-28 18:44:31

问题


I'm trying to convert UTC date to date with local timezone (using the country) with PySpark. I have the country as string and the date as timestamp

So the input is :

date = Timestamp('2016-11-18 01:45:55') # type is pandas._libs.tslibs.timestamps.Timestamp

country = "FR" # Type is string

import pytz
import pandas as pd

def convert_date_spark(date, country):
    timezone = pytz.country_timezones(country)[0]

    local_time = date.replace(tzinfo = pytz.utc).astimezone(timezone)
    date, time = local_time.date(), local_time.time()

    return pd.Timestamp.combine(date, time)

# Then i'm creating an UDF to give it to spark

convert_date_udf = udf(lambda x, y : convert_date_spark(x, y), TimestampType())

Then I use it in the function that feeds spark :

data = data.withColumn("date", convert_date_udf(data["date"], data["country"]))

I got the following error :

TypeError: tzinfo argument must be None or of a tzinfo subclass, not type 'str'

The expected output is the date with the same format

As tested with python, the _convert_date_spark_ functions works but this is not working in pyspark

Could you please help me finding a solution for this ?

Thanks


回答1:


Use tzinfo instance and not string as timezone.

>>> timezone_name = pytz.country_timezones(country)[0]
>>> timezone_name
'Europe/Paris'
>>> timezone = pytz.timezone(timezone_name)
>>> timezone
<DstTzInfo 'Europe/Paris' LMT+0:09:00 STD>
>>> 


来源:https://stackoverflow.com/questions/53763643/timezone-conversion-with-pyspark-from-timestamp-and-country

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!