Spark Equivalent of IF Then ELSE

后端 未结 4 652
梦如初夏
梦如初夏 2020-11-22 08:37

I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.

I want to crea

4条回答
  •  盖世英雄少女心
    2020-11-22 09:16

    Conditional statement In Spark

    • Using “when otherwise” on DataFrame
    • Using “case when” on DataFrame
    • Using && and || operator

    import org.apache.spark.sql.functions.{when, _}
    import spark.sqlContext.implicits._
    
    val spark: SparkSession = SparkSession.builder().master("local[1]").appName("SparkByExamples.com").getOrCreate()
    
    val data = List(("James ","","Smith","36636","M",60000),
            ("Michael ","Rose","","40288","M",70000),
            ("Robert ","","Williams","42114","",400000),
            ("Maria ","Anne","Jones","39192","F",500000),
            ("Jen","Mary","Brown","","F",0))
    
    val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
    val df = spark.createDataFrame(data).toDF(cols:_*)
    

    1. Using “when otherwise” on DataFrame

    Replace the value of gender with new value

    val df1 = df.withColumn("new_gender", when(col("gender") === "M","Male")
          .when(col("gender") === "F","Female")
          .otherwise("Unknown"))
    
    val df2 = df.select(col("*"), when(col("gender") === "M","Male")
          .when(col("gender") === "F","Female")
          .otherwise("Unknown").alias("new_gender"))
    

    2. Using “case when” on DataFrame

    val df3 = df.withColumn("new_gender",
      expr("case when gender = 'M' then 'Male' " +
                       "when gender = 'F' then 'Female' " +
                       "else 'Unknown' end"))
    

    Alternatively,

    val df4 = df.select(col("*"),
          expr("case when gender = 'M' then 'Male' " +
                           "when gender = 'F' then 'Female' " +
                           "else 'Unknown' end").alias("new_gender"))
    

    3. Using && and || operator

    val dataDF = Seq(
          (66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"
          )).toDF("id", "code", "amt")
    dataDF.withColumn("new_column",
           when(col("code") === "a" || col("code") === "d", "A")
          .when(col("code") === "b" && col("amt") === "4", "B")
          .otherwise("A1"))
          .show()
    

    Output:

    +---+----+---+----------+
    | id|code|amt|new_column|
    +---+----+---+----------+
    | 66|   a|  4|         A|
    | 67|   a|  0|         A|
    | 70|   b|  4|         B|
    | 71|   d|  4|         A|
    +---+----+---+----------+
    

提交回复
热议问题