Factorize Spark column

前端 未结 2 637
抹茶落季
抹茶落季 2021-01-07 02:31

Is it possible to factorize a Spark dataframe column? With factorizing I mean creating a mapping of each unique value in the column to the same ID.

Example, the orig

2条回答
  •  被撕碎了的回忆
    2021-01-07 02:51

    You can use an user defined function.

    First you create the mapping you need:

    val updateFunction = udf {(x: String) =>
      x match {
        case "A" => 0
        case "B" => 1
        case "C" => 2
        case _ => 3
      }
    }
    

    And now you only have to apply it to your DataFrame:

    df.withColumn("col3", updateFunction(df.col("col3")))
    

提交回复
热议问题