Loading com.databricks.spark.csv via RStudio

前端未结

关注

 4  1009

伪装坚强ぢ 2020-12-30 16:31

I have installed Spark-1.4.0. I have also installed its R package SparkR and I am able to use it via Spark-shell and via RStudio, however, there is one difference I can not

4条回答

离开以前 (楼主)

2020-12-30 17:18
My colleagues and I found the solution. We have initialized the sparkContext like this:
```
sc <- sparkR.init(appName="SparkR-Example",sparkEnvir=list(spark.executor.memory="1g"),sparkJars="spark-csv-assembly-1.1.0.jar")
```
We did not find how to load a remote jar, hence we have downloaded spark-csv_2.11-1.0.3.jar. Including this one in sparkJars however does not work, since it does not find its dependencies locally. You can add a list of jars as well, but we have build an assembly jar containing all dependencies. When loading this jar, it is possible to load the .csv-file as desired:
```
flights <- read.df(sqlContext, "data/nycflights13.csv","com.databricks.spark.csv",header="true")
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...