I\'m trying to count empty values in column in DataFrame like this:
df.filter((df(colname) === null) || (df(colname) === \"\")).count()
In
As mentioned on the question that df.filter((df(colname) === null) || (df(colname) === "")).count()
works for String
data types but the testing shows that null
are not handled.
@Psidom's answer handles both null
and empty
but does not handle for NaN
.
checking for .isNaN
should handle all three cases
df.filter(df(colName).isNull || df(colName) === "" || df(colName).isNaN).count()
You can use isNull
to test the null
condition:
val df = Seq((Some("a"), Some(1)), (null, null), (Some(""), Some(2))).toDF("A", "B")
// df: org.apache.spark.sql.DataFrame = [A: string, B: int]
df.filter(df("A").isNull || df("A") === "").count
// res7: Long = 2
df.filter(df("B").isNull || df("B") === "").count
// res8: Long = 1