发表新帖

发表新帖

Factorize Spark column

前端未结

关注

 2  637

抹茶落季 2021-01-07 02:31

Is it possible to factorize a Spark dataframe column? With factorizing I mean creating a mapping of each unique value in the column to the same ID.

Example, the orig

2条回答

被撕碎了的回忆 (楼主)

2021-01-07 02:51
You can use an user defined function.

First you create the mapping you need:
```
val updateFunction = udf {(x: String) =>
  x match {
    case "A" => 0
    case "B" => 1
    case "C" => 2
    case _ => 3
  }
}
```
And now you only have to apply it to your DataFrame:
```
df.withColumn("col3", updateFunction(df.col("col3")))
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题