How to deal with Spark UDF input/output of primitive nullable type

后端 未结 3 2021
走了就别回头了
走了就别回头了 2021-01-03 02:38

The issues:

1) Spark doesn\'t call UDF if input is column of primitive type that contains null:

inputDF.show()

+-----+
|  x  |
+-----+
         


        
3条回答
  •  说谎
    说谎 (楼主)
    2021-01-03 03:03

    I would also use Artur's solution, but there is also another way without using javas wrapper classes by using struct:

    import org.apache.spark.sql.functions.struct
    import org.apache.spark.sql.Row
    
    inputDF
      .withColumn("y",
         udf { (r: Row) => 
           if (r.isNullAt(0)) Some(1) else None
         }.apply(struct($"x"))
      )
      .show()
    

提交回复
热议问题