subtract two columns with null in spark dataframe

后端 未结 2 376
耶瑟儿~
耶瑟儿~ 2021-01-14 23:19

I new to spark, I have dataframe df:

+----------+------------+-----------+
| Column1  | Column2    | Sub       |                          
+----------+------         


        
相关标签:
2条回答
  • 2021-01-14 23:59

    You can use when function as

    import org.apache.spark.sql.functions._
    df.withColumn("Sub", when(col("Column1").isNull, lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull, lit(0)).otherwise(col("Column2")))
    

    you should have final result as

    +-------+-------+----+
    |Column1|Column2| Sub|
    +-------+-------+----+
    |      1|      2|-1.0|
    |      4|   null| 4.0|
    |      5|   null| 5.0|
    |      6|      8|-2.0|
    +-------+-------+----+
    
    0 讨论(0)
  • 2021-01-15 00:08

    You can coalesce nulls to zero on both columns and then do the subtraction:

    val df = Seq((Some(1), Some(2)), 
                 (Some(4), null), 
                 (Some(5), null), 
                 (Some(6), Some(8))
                ).toDF("A", "B")
    
    df.withColumn("Sub", abs(coalesce($"A", lit(0)) - coalesce($"B", lit(0)))).show
    +---+----+---+
    |  A|   B|Sub|
    +---+----+---+
    |  1|   2|  1|
    |  4|null|  4|
    |  5|null|  5|
    |  6|   8|  2|
    +---+----+---+
    
    0 讨论(0)
提交回复
热议问题