How to pass Array[Seq[String]] to apache spark udf? (Error: Not Applicable)

前端 未结 1 1515
生来不讨喜
生来不讨喜 2021-01-20 08:30

I have the following apache spark udf in scala:

val myFunc = udf {
  (userBias: Float, otherBiases: Map[Long, Float],
    userFactors: Seq[Float], context: S         


        
相关标签:
1条回答
  • 2021-01-20 09:00

    Spark 2.2+

    You can use typedLit functions:

    import org.apache.spark.sql.functions.typedLit
    
    myFunc(..., typedLit(context))
    

    Spark < 2.2

    Any argument that is passed directly to the UDF has to be a Column so if you want to pass constant array you'll have to convert it to column literal:

    import org.apache.spark.sql.functions.{array, lit}
    
    val myFunc: org.apache.spark.sql.UserDefinedFunction = ???
    
    myFunc(
      userBias("bias"),
      otherBias("biases"),
      userFactors("features"),
      // org.apache.spark.sql.Column
      array(context.map(xs => array(xs.map(lit _): _*)): _*)  
    )
    

    Non-Column objects can be passed only indirectly using closure, for example like this:

    def myFunc(context: Array[Seq[String]]) = udf {
      (userBias: Float, otherBiases: Map[Long, Float],  userFactors: Seq[Float]) => 
        ???
    }
    
    myFunc(context)(userBias("bias"), otherBias("biases"), userFactors("features"))
    
    0 讨论(0)
提交回复
热议问题