How to update column based on a condition (a value in a group)?

后端 未结 5 896
猫巷女王i
猫巷女王i 2021-02-07 18:16

I have the following df:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  fn|  red|
|  2|  fn| blue|
|  3|  fn|green|
+---+----+-----+
<
5条回答
  •  执念已碎
    2021-02-07 18:43

    You are conditionally updating the DataFrame if it satisfies a certain property. In this case the property is "the color column contains 'red'". The idiomatic way to express this is to filter with the desired predicate and then determine whether any rows satisfy it. There is no need for a join.

    import org.apache.spark.sql.functions.lit
    import org.apache.spark.sql.DataFrame
    
    def makeAllRedIfAnyAreRed(df: DataFrame) = {
        val containsRed = df.filter(df("color") === "red").count() > 0
        if (containsRed) df.withColumn("color", lit("red")) else df
    }
    

提交回复
热议问题