Spark DataFrame handing empty String in OneHotEncoder

前端 未结 3 1308
醉梦人生
醉梦人生 2021-01-18 03:54

I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, the applic

3条回答
  •  遥遥无期
    2021-01-18 04:22

    Yep, it's a little thorny but maybe you can just replace the empty string with something sure to be different than other values. NOTE that I am using pyspark DataFrameNaFunctions API but Scala's should be similar.

    df = sqlContext.createDataFrame([(0,"a"), (1,'b'), (2, 'c'), (3,''), (4,'a'), (5, 'c')], ['id', 'category'])
    df = df.na.replace('', 'EMPTY', 'category')
    df.show()
    
    +---+--------+
    | id|category|
    +---+--------+
    |  0|       a|
    |  1|       b|
    |  2|       c|
    |  3|   EMPTY|
    |  4|       a|
    |  5|       c|
    +---+--------+
    

提交回复
热议问题