How to update column based on a condition (a value in a group)?

后端 未结 5 911
猫巷女王i
猫巷女王i 2021-02-07 18:16

I have the following df:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  fn|  red|
|  2|  fn| blue|
|  3|  fn|green|
+---+----+-----+
<
5条回答
  •  [愿得一人]
    2021-02-07 18:51

    Spark 2.2.0: Sample Dataframe ( taken from above examples)

        val df = Seq(
      (1, "fn", "red"),
      (2, "fn", "blue"),
      (3, "fn", "green"),
      (4, "aa", "blue"),
      (5, "aa", "green"),
      (6, "bb", "red"),
      (7, "bb", "red"),
      (8, "aa", "blue")
    ).toDF("id", "dept", "color")
    

    created a UDF to perform the update by checking the condition.

    val replace_val = udf((x: String,y:String) => if (Option(x).getOrElse("").equalsIgnoreCase("fn")&&(!y.equalsIgnoreCase("red"))) "red" else y)
    
    val final_df = df.withColumn("color", replace_val($"dept",$"color"))
    final_df.show()
    

    output:

    spark 1.6:

    val conf = new SparkConf().setMaster("local").setAppName("My app")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    
    import sqlContext.implicits._
    // For implicit conversions like converting RDDs to DataFrames
    val df = sc.parallelize(Seq(
      (1, "fn", "red"),
      (2, "fn", "blue"),
      (3, "fn", "green"),
      (4, "aa", "blue"),
      (5, "aa", "green"),
      (6, "bb", "red"),
      (7, "bb", "red"),
      (8, "aa", "blue")
    ) ).toDF("id","dept","color")
    
    
    val replace_val = udf((x: String,y:String) => if (Option(x).getOrElse("").equalsIgnoreCase("fn")&&(!y.equalsIgnoreCase("red"))) "red" else y)
    val final_df = df.withColumn("color", replace_val($"dept",$"color"))
    
    final_df.show()
    

提交回复
热议问题