I have the following df:
| 1| fn| red|
| 2| fn| blue|
| 3| fn|green|
As there could be few rows in filtered dataframe I'm adding solution with isin()
and .withColumn()
Sample DataFrame
val df = Seq(
(1, "fn", "red"),
(2, "fn", "blue"),
(3, "fn", "green"),
(4, "aa", "blue"),
(5, "aa", "green"),
(6, "bb", "red"),
(7, "bb", "red"),
(8, "aa", "blue")
).toDF("id", "dept", "color")
Now Let's pick only dept
s which have at least one red color
row and place it in broadcast
variable like below.
val depts = sc.broadcast(df.filter($"color" === "red").select(collect_set("dept")).first.getSeq[String](0)))
Update red color for filtered depts
takes a vararg so convert list to vararg (depts.value:_*
//creating new column by giving diff name (clr) to see the diff
val result = df.withColumn("clr", when($"dept".isin(depts.value:_*),lit("red"))
| id|dept|color| clr|
| 1| fn| red| red|
| 2| fn| blue| red|
| 3| fn|green| red|
| 4| aa| blue| blue|
| 5| aa|green|green|
| 6| bb| red| red|
| 7| bb| red| red|
| 8| aa| blue| blue|