发表新帖

发表新帖

Spark DataFrame handing empty String in OneHotEncoder

前端未结

关注

 3  1308

醉梦人生 2021-01-18 03:54

I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, the applic

3条回答

遥遥无期 (楼主)

2021-01-18 04:22
Yep, it's a little thorny but maybe you can just replace the empty string with something sure to be different than other values. NOTE that I am using pyspark DataFrameNaFunctions API but Scala's should be similar.
```
df = sqlContext.createDataFrame([(0,"a"), (1,'b'), (2, 'c'), (3,''), (4,'a'), (5, 'c')], ['id', 'category'])
df = df.na.replace('', 'EMPTY', 'category')
df.show()

+---+--------+
| id|category|
+---+--------+
|  0|       a|
|  1|       b|
|  2|       c|
|  3|   EMPTY|
|  4|       a|
|  5|       c|
+---+--------+
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题