Returning Multiple Arrays from User-Defined Aggregate Function (UDAF) in Apache Spark SQL

后端 未结 1 1012
予麋鹿
予麋鹿 2020-12-31 14:17

I am trying to create a user-defined aggregate function (UDAF) in Java using Apache Spark SQL that returns multiple arrays on completion. I have searched online and cannot f

相关标签:
1条回答
  • 2020-12-31 14:42

    As far as I can tell returning a tuple should be just enough. In Scala:

    import org.apache.spark.sql.expressions._
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.functions.udf
    import org.apache.spark.sql.{Row, Column}
    
    object DummyUDAF extends UserDefinedAggregateFunction {
      def inputSchema = new StructType().add("x", StringType)
      def bufferSchema = new StructType()
        .add("buff", ArrayType(LongType))
        .add("buff2", ArrayType(DoubleType))
      def dataType = new StructType()
        .add("xs", ArrayType(LongType))
        .add("ys", ArrayType(DoubleType))
      def deterministic = true 
      def initialize(buffer: MutableAggregationBuffer) = {}
      def update(buffer: MutableAggregationBuffer, input: Row) = {}
      def merge(buffer1: MutableAggregationBuffer, buffer2: Row) = {}
      def evaluate(buffer: Row) = (Array(1L, 2L, 3L), Array(1.0, 2.0, 3.0))
    }
    
    val df =  sc.parallelize(Seq(("a", 1), ("b", 2))).toDF("k", "v")
    df.select(DummyUDAF($"k")).show(1, false)
    
    // +---------------------------------------------------+
    // |(DummyUDAF$(k),mode=Complete,isDistinct=false)     |
    // +---------------------------------------------------+
    // |[WrappedArray(1, 2, 3),WrappedArray(1.0, 2.0, 3.0)]|
    // +---------------------------------------------------+
    
    0 讨论(0)
提交回复
热议问题