How to use Jupyter + SparkR and custom R install

前端 未结 2 699
傲寒
傲寒 2021-01-15 20:06

I am using a Dockerized image and Jupyter notebook along with SparkR kernel. When I create a SparkR notebook, it uses an install of Microsoft R (3.3.2) instead of vanilla C

相关标签:
2条回答
  • 2021-01-15 20:22

    To use a custom R environment I believe you need to set the following application properties when you start Spark:

        "spark.r.command": "/custom/path/bin/R",
        "spark.r.driver.command": "/custom/path/bin/Rscript",
        "spark.r.shell.command" : "/custom/path/bin/R"
    

    This is more completely documented here: https://spark.apache.org/docs/latest/configuration.html#sparkr

    0 讨论(0)
  • 2021-01-15 20:31

    Docker-related issues aside, the settings for Jupyter kernels are configured in files named kernel.json, residing in specific directories (one per kernel) which can be seen using the command jupyter kernelspec list; for example, here is the case in my (Linux) machine:

    $ jupyter kernelspec list
    Available kernels:
      python2       /usr/lib/python2.7/site-packages/ipykernel/resources
      caffe         /usr/local/share/jupyter/kernels/caffe
      ir            /usr/local/share/jupyter/kernels/ir
      pyspark       /usr/local/share/jupyter/kernels/pyspark
      pyspark2      /usr/local/share/jupyter/kernels/pyspark2
      tensorflow    /usr/local/share/jupyter/kernels/tensorflow
    

    Again, as an example, here are the contents of the kernel.json for my R kernel (ir)

    {
      "argv": ["/usr/lib64/R/bin/R", "--slave", "-e", "IRkernel::main()", "--args", "{connection_file}"],
      "display_name": "R 3.3.2",
      "language": "R"
    }
    

    And here is the respective file for my pyspark2 kernel:

    {
     "display_name": "PySpark (Spark 2.0)",
     "language": "python",
     "argv": [
      "/opt/intel/intelpython27/bin/python2",
      "-m",
      "ipykernel",
      "-f",
      "{connection_file}"
     ],
     "env": {
      "SPARK_HOME": "/home/ctsats/spark-2.0.0-bin-hadoop2.6",
      "PYTHONPATH": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python:/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/lib/py4j-0.10.1-src.zip",
      "PYTHONSTARTUP": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/pyspark/shell.py",
      "PYSPARK_PYTHON": "/opt/intel/intelpython27/bin/python2"
     }
    }
    

    As you can see, in both cases the first element of argv is the executable for the respective language - in my case, GNU R for my ir kernel and Intel Python 2.7 for my pyspark2 kernel. Changing this, so that it points to your GNU R executable, should resolve your issue.

    0 讨论(0)
提交回复
热议问题