multiple conditions for filter in spark data frames

前端 未结 11 1084
醉酒成梦
醉酒成梦 2020-12-03 04:41

I have a data frame with four fields. one of the field name is Status and i am trying to use a OR condition in .filter for a dataframe . I tried below queries but no luck.

相关标签:
11条回答
  • 2020-12-03 05:15

    Another way is to use function expr with where clause

    import org.apache.spark.sql.functions.expr
    
    df2 = df1.where(expr("col1 = 'value1' and col2 = 'value2'"))
    

    It works the same.

    0 讨论(0)
  • 2020-12-03 05:20
    df2 = df1.filter("Status=2")
         .filter("Status=3");
    
    0 讨论(0)
  • 2020-12-03 05:21

    You can try, (filtering with 1 object like a list or a set of values)

    ds = ds.filter(functions.col(COL_NAME).isin(myList));
    

    or as @Tony Fraser suggested, you can try, (with a Seq of objects)

    ds = ds.filter(functions.col(COL_NAME).isin(mySeq));
    

    All the answers are correct but most of them do not represent a good coding style. Also, you should always consider the variable length of arguments for the future, even though they are static at a certain point in time.

    0 讨论(0)
  • 2020-12-03 05:23

    If we want partial match just like contains, we can chain the contain call like this :

    def getSelectedTablesRows2(allTablesInfoDF: DataFrame, tableNames: Seq[String]): DataFrame = {
    
        val tableFilters = tableNames.map(_.toLowerCase()).map(name => lower(col("table_name")).contains(name))
        val finalFilter = tableFilters.fold(lit(false))((accu, newTableFilter) => accu or newTableFilter)
        allTablesInfoDF.where(finalFilter)
    
      }
    
    0 讨论(0)
  • 2020-12-03 05:24

    Instead of:

    df2 = df1.filter("Status=2" || "Status =3")
    

    Try:

    df2 = df1.filter($"Status" === 2 || $"Status" === 3)
    
    0 讨论(0)
提交回复
热议问题