问题
I am using a spark standalone with hadoop prebuilt. I was wondering what library I should import in order to let me read a .csv file?
I found one library from github: https://github.com/tototoshi/scala-csv But when I typed import com.github.tototoshi.csv._ as illustrated in readme, it doesn't work. Should I do something else before importing it maybe something like buiding it using sbt first? I tried to build using sbt and it doesn't work either (what I did is following the step in the last part of readme, clone the code to my local computer, install sbt and do ./sbt, but doesn't work).
回答1:
Just enable spark-csv package e.g.
spark-shell --packages com.databricks:spark-csv_2.10:1.4.0
This will enable csv
format e.g.
val df = sqlContext.read.format("csv").load("foo.csv")
and in case you have a header
val df = sqlContext.read.format("csv").option("header", "true").load("foo.csv")
See github repo for all options https://github.com/databricks/spark-csv
回答2:
You should rephrase your question to explain what it is not working otherwise people will keep down-voting.
If you want to use the spark-shell you can provide the list of packages to import dynamically in your shell with "--packages" like @the.malkolm. I think still that solution is not complete because you are not asking how to fix it in spark-shell but how to compile in sbt. I have used before https://github.com/tototoshi/scala-csv with maven. I assume things are not that much different in sbt except that here you have to add the following line to your build.sbt and then ./sbt.
libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.0"
You can try the library provided by databricks with sbt with this line
libraryDependencies += "com.databricks" %% "spark-csv_2.10" % "1.4.0"
If that would not work I would suggest you to take a better look at http://www.scala-sbt.org/documentation.html since it is probably not a problem of which library to use but how to build an sbt project.
来源:https://stackoverflow.com/questions/36205991/how-to-read-csv-file-using-spark-shell