Loading com.databricks.spark.csv via RStudio

前端 未结 4 1009
伪装坚强ぢ
伪装坚强ぢ 2020-12-30 16:31

I have installed Spark-1.4.0. I have also installed its R package SparkR and I am able to use it via Spark-shell and via RStudio, however, there is one difference I can not

4条回答
  •  离开以前
    2020-12-30 17:18

    My colleagues and I found the solution. We have initialized the sparkContext like this:

    sc <- sparkR.init(appName="SparkR-Example",sparkEnvir=list(spark.executor.memory="1g"),sparkJars="spark-csv-assembly-1.1.0.jar")
    

    We did not find how to load a remote jar, hence we have downloaded spark-csv_2.11-1.0.3.jar. Including this one in sparkJars however does not work, since it does not find its dependencies locally. You can add a list of jars as well, but we have build an assembly jar containing all dependencies. When loading this jar, it is possible to load the .csv-file as desired:

    flights <- read.df(sqlContext, "data/nycflights13.csv","com.databricks.spark.csv",header="true")
    

提交回复
热议问题