Link Spark with iPython Notebook

后端 未结 4 1785
轮回少年
轮回少年 2020-11-27 17:00

I have followed some tutorial online but they do not work with Spark 1.5.1 on OS X El Capitan (10.11)

Basically I have run this commands download

相关标签:
4条回答
  • 2020-11-27 17:15

    Spark with IPython/Jupyter notebook is great and I'm pleased the Alberto was able to help you get it working.

    For reference it's also worth considering 2 good alternatives that come prepackaged and can easily be integrated into a YARN cluster (if desired.)

    Spark Notebook: https://github.com/andypetrella/spark-notebook

    Apache Zeppelin: https://zeppelin.incubator.apache.org/

    At the time of writing Spark Notebook (v0.6.1) is more mature and you can prebuild an install against your Spark and Hadoop version here: http://spark-notebook.io/

    Zeppelin (v0.5) looks very promising but doesn't offer as much functionality as Spark Notebook or IPython with Spark right now.

    0 讨论(0)
  • 2020-11-27 17:16

    FYI, you can run Scala, PySpark, SparkR, and SQL with Spark running on top of Jupyter via https://github.com/ibm-et/spark-kernel now. The new interpreters were added (and marked experimental) from pull request https://github.com/ibm-et/spark-kernel/pull/146.

    See the language support wiki page for more information.

    0 讨论(0)
  • 2020-11-27 17:28

    I have Jupyter installed, and indeed It is simpler than you think:

    1. Install anaconda for OSX.
    2. Install jupyter typing the next line in your terminal Click me for more info.

      ilovejobs@mymac:~$ conda install jupyter
      
    3. Update jupyter just in case.

      ilovejobs@mymac:~$ conda update jupyter
      
    4. Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.

      ilovejobs@mymac:~$ cd Downloads 
      ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
      
    5. Create an Apps folder on your home (i.e):

      ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
      
    6. Move the uncompressed folder spark-1.5.1 to the ~/Apps directory.

      ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
      
    7. Move to the ~/Apps directory and verify that spark is there.

      ilovejobs@mymac:~/Downloads$ cd ~/Apps
      ilovejobs@mymac:~/Apps$ ls -l
      drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
      
    8. Here is the first tricky part. Add the spark binaries to your $PATH:

      ilovejobs@mymac:~/Apps$ cd
      ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile
      
    9. Here is the second tricky part. Add this environment variables also:

      ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
      ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
      
    10. Source the profile to make these variables available for this terminal

      ilovejobs@mymac:~$ source .profile
      
    11. Create a ~/notebooks directory.

      ilovejobs@mymac:~$ mkdir notebooks
      
    12. Move to ~/notebooks and run pyspark:

      ilovejobs@mymac:~$ cd notebooks
      ilovejobs@mymac:~/notebooks$ pyspark
      

    Notice that you can add those variables to the .bashrc located in your home. Now be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)

    0 讨论(0)
  • 2020-11-27 17:36

    First, make sure you have got a spark enviornment in your machine.

    Then, install a python module findspark via pip:

    $ sudo pip install findspark
    

    And then in the python shell:

    import findspark
    findspark.init()
    
    import pyspark
    sc = pyspark.SparkContext(appName="myAppName")
    

    Now you can do what you want with pyspark in the python shell(or in ipython).

    Actually it's the easiest way in my view to use spark kernel in the jupyter

    0 讨论(0)
提交回复
热议问题