How to read .csv file using spark-shell

。_饼干妹妹 提交于 2020-01-03 06:36:23

问题


I am using a spark standalone with hadoop prebuilt. I was wondering what library I should import in order to let me read a .csv file?

I found one library from github: https://github.com/tototoshi/scala-csv But when I typed import com.github.tototoshi.csv._ as illustrated in readme, it doesn't work. Should I do something else before importing it maybe something like buiding it using sbt first? I tried to build using sbt and it doesn't work either (what I did is following the step in the last part of readme, clone the code to my local computer, install sbt and do ./sbt, but doesn't work).


回答1:


Just enable spark-csv package e.g.

spark-shell --packages com.databricks:spark-csv_2.10:1.4.0

This will enable csv format e.g.

val df = sqlContext.read.format("csv").load("foo.csv")

and in case you have a header

val df = sqlContext.read.format("csv").option("header", "true").load("foo.csv")

See github repo for all options https://github.com/databricks/spark-csv




回答2:


You should rephrase your question to explain what it is not working otherwise people will keep down-voting.

If you want to use the spark-shell you can provide the list of packages to import dynamically in your shell with "--packages" like @the.malkolm. I think still that solution is not complete because you are not asking how to fix it in spark-shell but how to compile in sbt. I have used before https://github.com/tototoshi/scala-csv with maven. I assume things are not that much different in sbt except that here you have to add the following line to your build.sbt and then ./sbt.

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.0"

You can try the library provided by databricks with sbt with this line

libraryDependencies += "com.databricks" %% "spark-csv_2.10" % "1.4.0"

If that would not work I would suggest you to take a better look at http://www.scala-sbt.org/documentation.html since it is probably not a problem of which library to use but how to build an sbt project.



来源:https://stackoverflow.com/questions/36205991/how-to-read-csv-file-using-spark-shell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!