How to use Jupyter + SparkR and custom R install

前端未结

关注

 2  704

I am using a Dockerized image and Jupyter notebook along with SparkR kernel. When I create a SparkR notebook, it uses an install of Microsoft R (3.3.2) instead of vanilla C

相关标签:

2条回答

夕颜

2021-01-15 20:22
To use a custom R environment I believe you need to set the following application properties when you start Spark:
```
    "spark.r.command": "/custom/path/bin/R",
    "spark.r.driver.command": "/custom/path/bin/Rscript",
    "spark.r.shell.command" : "/custom/path/bin/R"
```
This is more completely documented here: https://spark.apache.org/docs/latest/configuration.html#sparkr
0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2021-01-15 20:31

Docker-related issues aside, the settings for Jupyter kernels are configured in files named kernel.json, residing in specific directories (one per kernel) which can be seen using the command jupyter kernelspec list; for example, here is the case in my (Linux) machine:

$ jupyter kernelspec list
Available kernels:
  python2       /usr/lib/python2.7/site-packages/ipykernel/resources
  caffe         /usr/local/share/jupyter/kernels/caffe
  ir            /usr/local/share/jupyter/kernels/ir
  pyspark       /usr/local/share/jupyter/kernels/pyspark
  pyspark2      /usr/local/share/jupyter/kernels/pyspark2
  tensorflow    /usr/local/share/jupyter/kernels/tensorflow

Again, as an example, here are the contents of the kernel.json for my R kernel (ir)

{
  "argv": ["/usr/lib64/R/bin/R", "--slave", "-e", "IRkernel::main()", "--args", "{connection_file}"],
  "display_name": "R 3.3.2",
  "language": "R"
}

And here is the respective file for my pyspark2 kernel:

{
 "display_name": "PySpark (Spark 2.0)",
 "language": "python",
 "argv": [
  "/opt/intel/intelpython27/bin/python2",
  "-m",
  "ipykernel",
  "-f",
  "{connection_file}"
 ],
 "env": {
  "SPARK_HOME": "/home/ctsats/spark-2.0.0-bin-hadoop2.6",
  "PYTHONPATH": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python:/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/lib/py4j-0.10.1-src.zip",
  "PYTHONSTARTUP": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/pyspark/shell.py",
  "PYSPARK_PYTHON": "/opt/intel/intelpython27/bin/python2"
 }
}

As you can see, in both cases the first element of argv is the executable for the respective language - in my case, GNU R for my ir kernel and Intel Python 2.7 for my pyspark2 kernel. Changing this, so that it points to your GNU R executable, should resolve your issue.

0 讨论(0)