Spark dataframe select rows with at least one null or blank in any column of that row

后端 未结 1 1087
-上瘾入骨i
-上瘾入骨i 2021-02-10 18:18

from one dataframe i want to create a new dataframe where at least one value in any of the columns is null or blank in spark 1.5 / scala.

i am trying to write a general

1条回答
  •  星月不相逢
    2021-02-10 18:55

    Sample Data:

    val df = Seq((null, Some(2)), (Some("a"), Some(4)), (Some(""), Some(5)), (Some("b"), null)).toDF("A", "B")
    
    df.show
    +----+----+
    |   A|   B|
    +----+----+
    |null|   2|
    |   a|   4|
    |    |   5|
    |   b|null|
    +----+----+  
    

    You can construct the condition as, assume blank means empty string here:

    import org.apache.spark.sql.functions.col
    val cond = df.columns.map(x => col(x).isNull || col(x) === "").reduce(_ || _)
    
    df.filter(cond).show
    +----+----+
    |   A|   B|
    +----+----+
    |null|   2|
    |    |   5|
    |   b|null|
    +----+----+
    

    0 讨论(0)
提交回复
热议问题