Why does reading csv file with empty values lead to IndexOutOfBoundException?

后端 未结 4 1215
無奈伤痛
無奈伤痛 2021-01-19 19:10

I have a csv file with the foll struct

Name | Val1 | Val2 | Val3 | Val4 | Val5
John     1      2
Joe      1      2
David    1      2            10    11
         


        
4条回答
  •  醉梦人生
    2021-01-19 19:54

    The possible solution to that problem is replacing missing value with Double.NaN. Suppose I have a file example.csv with columns in it

    David,1,2,10,,11

    You can read the csv file as text file as follow

    fileRDD=sc.textFile(example.csv).map(x=> {val y=x.split(","); val z=y.map(k=> if(k==""){Double.NaN}else{k.toDouble()})})
    

    And then you can use your code to create dataframe from it

提交回复
热议问题