How to update column based on a condition (a value in a group)?

后端 未结 5 899
猫巷女王i
猫巷女王i 2021-02-07 18:16

I have the following df:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  fn|  red|
|  2|  fn| blue|
|  3|  fn|green|
+---+----+-----+
<
5条回答
  •  死守一世寂寞
    2021-02-07 18:33

    As there could be few rows in filtered dataframe I'm adding solution with isin() and .withColumn() combination.

    Sample DataFrame

    val df = Seq(
      (1, "fn", "red"),
      (2, "fn", "blue"),
      (3, "fn", "green"),
      (4, "aa", "blue"),
      (5, "aa", "green"),
      (6, "bb", "red"),
      (7, "bb", "red"),
      (8, "aa", "blue")
    ).toDF("id", "dept", "color")
    

    Now Let's pick only depts which have at least one red color row and place it in broadcast variable like below.

    val depts = sc.broadcast(df.filter($"color" === "red").select(collect_set("dept")).first.getSeq[String](0)))
    

    Update red color for filtered depts records.

    isin() takes a vararg so convert list to vararg (depts.value:_*)

    //creating new column by giving diff name (clr) to see the diff
    val result = df.withColumn("clr", when($"dept".isin(depts.value:_*),lit("red"))
                        .otherwise($"color"))
    
    result.show()
    
    +---+----+-----+-----+
    | id|dept|color|  clr|
    +---+----+-----+-----+
    |  1|  fn|  red|  red|
    |  2|  fn| blue|  red|
    |  3|  fn|green|  red|
    |  4|  aa| blue| blue|
    |  5|  aa|green|green|
    |  6|  bb|  red|  red|
    |  7|  bb|  red|  red|
    |  8|  aa| blue| blue|
    +---+----+-----+-----+
    

提交回复
热议问题