How to create a udf in PySpark which returns an array of strings?

前端 未结 1 1300
抹茶落季
抹茶落季 2021-02-07 04:45

I have a udf which returns a list of strings. this should not be too hard. I pass in the datatype when executing the udf since it returns an array of strings: ArrayType

相关标签:
1条回答
  • 2021-02-07 05:29

    You need to initialize a StringType instance:

    label_udf = udf(my_udf, ArrayType(StringType()))
    #                                           ^^ 
    df.withColumn('subset', label_udf(df.col1)).show()
    +------------+------+
    |        col1|subset|
    +------------+------+
    |     oculunt|[s, n]|
    |predistposed|[s, n]|
    | incredulous|[s, n]|
    +------------+------+
    
    0 讨论(0)
提交回复
热议问题