Given a dataframe in which one column is a sequence of structs generated by the following sequence
val df = spark
.range(10)
.map((i) => (i % 2, util.
You cannot use a case-class as the input-argument of your UDF (but you can return case classes from the UDF). To map an array of structs, you can pass in a Seq[Row]
to your UDF:
val my_uDF = udf((data: Seq[Row]) => {
// This is an example. I don't actually want the sum
data.map{case Row(x:Int,y:Int) => x+y}.sum
})
df.withColumn("result", my_udf($"my_list")).show
+---+--------------------+------+
| a| my_list|result|
+---+--------------------+------+
| 0|[[0,3], [5,5], [3...| 41|
| 1|[[0,9], [4,9], [6...| 54|
+---+--------------------+------+