Using Spark UDFs with struct sequences

前端 未结 1 427
暗喜
暗喜 2020-12-15 09:05

Given a dataframe in which one column is a sequence of structs generated by the following sequence

val df = spark
  .range(10)
  .map((i) => (i % 2, util.         


        
相关标签:
1条回答
  • 2020-12-15 09:31

    You cannot use a case-class as the input-argument of your UDF (but you can return case classes from the UDF). To map an array of structs, you can pass in a Seq[Row] to your UDF:

    val  my_uDF = udf((data: Seq[Row]) => {
      // This is an example. I don't actually want the sum
      data.map{case Row(x:Int,y:Int) => x+y}.sum
    })
    
    df.withColumn("result", my_udf($"my_list")).show
    
    +---+--------------------+------+
    |  a|             my_list|result|
    +---+--------------------+------+
    |  0|[[0,3], [5,5], [3...|    41|
    |  1|[[0,9], [4,9], [6...|    54|
    +---+--------------------+------+
    
    0 讨论(0)
提交回复
热议问题