Spark “replacing null with 0” performance comparison

后端 未结 1 1343
-上瘾入骨i
-上瘾入骨i 2021-02-14 11:27

Spark 1.6.1, Scala api.

For a dataframe, I need to replace all null value of a certain column with 0. I have 2 ways to do this. 1.

myDF.withColumn(\"pipC         


        
相关标签:
1条回答
  • 2021-02-14 12:07

    There are not the same but performance should be similar. na.fill uses coalesce but it replaces NaN and NULLs not only NULLS.

    val y = when($"x" === 0, $"x".cast("double")).when($"x" === 1, lit(null)).otherwise(lit("NaN").cast("double"))
    val df = spark.range(0, 3).toDF("x").withColumn("y", y)
    
    df.withColumn("y", when($"y".isNull(), 0.0).otherwise($"y")).show()
    df.na.fill(0.0, Seq("y")).show()
    
    0 讨论(0)
提交回复
热议问题