Count empty values in dataframe column in Spark (Scala)

后端 未结 2 1857
不知归路
不知归路 2021-01-16 09:07

I\'m trying to count empty values in column in DataFrame like this:

df.filter((df(colname) === null) || (df(colname) === \"\")).count()

In

相关标签:
2条回答
  • 2021-01-16 09:28

    As mentioned on the question that df.filter((df(colname) === null) || (df(colname) === "")).count() works for String data types but the testing shows that null are not handled.

    @Psidom's answer handles both null and empty but does not handle for NaN.

    checking for .isNaN should handle all three cases

    df.filter(df(colName).isNull || df(colName) === "" || df(colName).isNaN).count()
    
    0 讨论(0)
  • 2021-01-16 09:47

    You can use isNull to test the null condition:

    val df = Seq((Some("a"), Some(1)), (null, null), (Some(""), Some(2))).toDF("A", "B")
    // df: org.apache.spark.sql.DataFrame = [A: string, B: int]
    
    df.filter(df("A").isNull || df("A") === "").count
    // res7: Long = 2
    
    df.filter(df("B").isNull || df("B") === "").count
    // res8: Long = 1
    
    0 讨论(0)
提交回复
热议问题