问题
Question I have is can we we use keyword arguments along with UDF in Pyspark as I did below. conv method has a keyword argument conv_type which by default is assigned to a specific type of formatter however I want to specify a different format at some places. Which is not getting through in udf because of keyword argument. Is there a different approach of using keyword argument here?
from datetime import datetime as dt, timedelta as td,date
tpid_date_dict = {'69': '%d/%m/%Y', '62': '%Y/%m/%d', '70201': '%m/%d/%y', '66': '%d.%m.%Y', '11': '%d-%m-%Y', '65': '%Y-%m-%d'}
def date_formatter_based_on_id(column, date_format):
val = dt.strptime(str(column),'%Y-%m-%d').strftime(date_format)
return val
def generic_date_formatter(column, date_format):
val = dt.strptime(str(column),date_format).strftime('%Y-%m-%d')
return val
def conv(column, id, conv_type=date_formatter_based_on_id):
try:
date_format=tpid_date_dict[id]
except KeyError as e:
print("Key value not found!")
val = None
if column:
try:
val = conv_type(column, date_format)
except Exception as err:
val = column
return val
conv_func = functions.udf(conv, StringType())
date_formatted = renamed_cols.withColumn("check_in_std",
conv_func(functions.col("check_in"), functions.col("id"),
generic_date_formatter))
So the problem is with the last statement(date_formatted = renamed_cols.withColumn("check_in_std", conv_func(functions.col("check_in"), functions.col("id"), generic_date_formatter))) Since the third argument generic_date_formatter is a keyword argument.
On trying this I get following error: AttributeError: 'function' object has no attribute '_get_object_id'
回答1:
Unfortunately you cannot use udf
with keyword arguments. UserDefinedFunction.__call__
is defined with positional arguments only:
def __call__(self, *cols):
judf = self._judf
sc = SparkContext._active_spark_context
return Column(judf.apply(_to_seq(sc, cols, _to_java_column)))
but the problem you have is not really related to keyword arguments. You get exception because generic_date_formatter
is not a Column
object but a function.
You can create udf
dynamically:
def conv(conv_type=date_formatter_based_on_id):
def _(column, id):
try:
date_format=tpid_date_dict[id]
except KeyError as e:
print("Key value not found!")
val = None
if column:
try:
val = conv_type(column, date_format)
except Exception as err:
val = column
return val
return udf(_, StringType())
which can be called:
conv_func(generic_date_formatter)(functions.col("check_in"), functions.col("id"))
Check Passing a data frame column and external list to udf under withColumn for details.
来源:https://stackoverflow.com/questions/50240597/can-we-use-keyword-arguments-in-udf