Filter spark/scala dataframe if column is present in set

前端 未结 2 974
耶瑟儿~
耶瑟儿~ 2021-01-14 02:55

I\'m using Spark 1.4.0, this is what I have so far:

data.filter($\"myColumn\".in(lit(\"A\"), lit(\"B\"), lit(\"C\"), ...))

The function lit

相关标签:
2条回答
  • 2021-01-14 03:12

    This PR has been merged into Spark 2.4. You can now do

    val profileDF = Seq(
      Some(1), Some(2), Some(3), Some(4),
      Some(5), Some(6), Some(7), None
    ).toDF("profileID")
    
    val validUsers: Set[Any] = Set(6, 7.toShort, 8L, "3")
    
    val result = profileDF.withColumn("isValid", $"profileID".isInCollection(validUsers))
    
    result.show(10)
    """
    +---------+-------+
    |profileID|isValid|
    +---------+-------+
    |        1|  false|
    |        2|  false|
    |        3|   true|
    |        4|  false|
    |        5|  false|
    |        6|   true|
    |        7|   true|
    |     null|   null|
    +---------+-------+
     """.stripMargin
    
    0 讨论(0)
  • 2021-01-14 03:19

    Spark 1.4 or older:

    val validValues = Set("A", "B", "C").map(lit(_))
    data.filter($"myColumn".in(validValues.toSeq: _*))
    

    Spark 1.5 or newer:

    val validValues = Set("A", "B", "C")
    data.filter($"myColumn".isin(validValues.toSeq: _*))
    
    0 讨论(0)
提交回复
热议问题