How to update column based on a condition (a value in a group)?

后端 未结 5 907
猫巷女王i
猫巷女王i 2021-02-07 18:16

I have the following df:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  fn|  red|
|  2|  fn| blue|
|  3|  fn|green|
+---+----+-----+
<
5条回答
  •  广开言路
    2021-02-07 18:45

    Efficient solution which doesn't require expensive grouping:

    // All groups with `red`
    df.where($"color" === "red").select($"fn".alias("fn_")).distinct
      // Join with input
      .join(df.as("df"), $"fn" === $"fn_", "rightouter")
      // Replace `color`
      .withColumn("color", when($"fn_"isNull, $"color").otherwise(lit("red")))
      .drop("fn_")
    

提交回复
热议问题