问题
I'm trying to convert UTC date to date with local timezone (using the country) with PySpark. I have the country as string and the date as timestamp
So the input is :
date = Timestamp('2016-11-18 01:45:55') # type is pandas._libs.tslibs.timestamps.Timestamp
country = "FR" # Type is string
import pytz
import pandas as pd
def convert_date_spark(date, country):
timezone = pytz.country_timezones(country)[0]
local_time = date.replace(tzinfo = pytz.utc).astimezone(timezone)
date, time = local_time.date(), local_time.time()
return pd.Timestamp.combine(date, time)
# Then i'm creating an UDF to give it to spark
convert_date_udf = udf(lambda x, y : convert_date_spark(x, y), TimestampType())
Then I use it in the function that feeds spark :
data = data.withColumn("date", convert_date_udf(data["date"], data["country"]))
I got the following error :
TypeError: tzinfo argument must be None or of a tzinfo subclass, not type 'str'
The expected output is the date with the same format
As tested with python, the _convert_date_spark_ functions works but this is not working in pyspark
Could you please help me finding a solution for this ?
Thanks
回答1:
Use tzinfo
instance and not string
as timezone.
>>> timezone_name = pytz.country_timezones(country)[0]
>>> timezone_name
'Europe/Paris'
>>> timezone = pytz.timezone(timezone_name)
>>> timezone
<DstTzInfo 'Europe/Paris' LMT+0:09:00 STD>
>>>
来源:https://stackoverflow.com/questions/53763643/timezone-conversion-with-pyspark-from-timestamp-and-country