I have a udf which returns a list of strings. this should not be too hard. I pass in the datatype when executing the udf since it returns an array of strings: ArrayType
ArrayType
You need to initialize a StringType instance:
StringType
label_udf = udf(my_udf, ArrayType(StringType())) # ^^ df.withColumn('subset', label_udf(df.col1)).show() +------------+------+ | col1|subset| +------------+------+ | oculunt|[s, n]| |predistposed|[s, n]| | incredulous|[s, n]| +------------+------+