Pyspark UDF AttributeError: 'NoneType' object has no attribute '_jvm'

问题

I have a udf function:

@staticmethod
@F.udf("array<int>")
def create_users_array(val):
    """ Takes column of ints, returns column of arrays containing ints. """ 
    return [val for _ in range(val)]

I call it like so:

df.withColumn("myArray", create_users_array(df["myNumber"]))

I pass it a dataframe column of integers, and it returns an array of that integer.

E.g. 4 --> [4,4,4,4]

It was working until we upgraded from Python 2.7, and upgraded our EMR version (which I believe uses Pyspark 2.3)

Anyone know what is causing this?

回答1:

Looks like this had something to do with the improvements made to UDFs in the newer version (or rather, deprecation of old syntax). Changing the udf decorator worked for me. @F.udf("array<int>") --> @F.udf(ArrayType(IntegerType()))

来源：https://stackoverflow.com/questions/49821328/pyspark-udf-attributeerror-nonetype-object-has-no-attribute-jvm

标签

python-3.x

python-2.7

apache-spark

pyspark

user-defined-functions

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!