问题
I have a udf function:
@staticmethod
@F.udf("array<int>")
def create_users_array(val):
""" Takes column of ints, returns column of arrays containing ints. """
return [val for _ in range(val)]
I call it like so:
df.withColumn("myArray", create_users_array(df["myNumber"]))
I pass it a dataframe column of integers, and it returns an array of that integer.
E.g.
4 --> [4,4,4,4]
It was working until we upgraded from Python 2.7, and upgraded our EMR version (which I believe uses Pyspark 2.3)
Anyone know what is causing this?
回答1:
Looks like this had something to do with the improvements made to UDFs in the newer version (or rather, deprecation of old syntax). Changing the udf decorator worked for me. @F.udf("array<int>")
--> @F.udf(ArrayType(IntegerType()))
来源:https://stackoverflow.com/questions/49821328/pyspark-udf-attributeerror-nonetype-object-has-no-attribute-jvm