I have followed some tutorial online but they do not work with Spark 1.5.1
on OS X El Capitan (10.11)
Basically I have run this commands download
Spark with IPython/Jupyter notebook is great and I'm pleased the Alberto was able to help you get it working.
For reference it's also worth considering 2 good alternatives that come prepackaged and can easily be integrated into a YARN cluster (if desired.)
Spark Notebook: https://github.com/andypetrella/spark-notebook
Apache Zeppelin: https://zeppelin.incubator.apache.org/
At the time of writing Spark Notebook (v0.6.1) is more mature and you can prebuild an install against your Spark and Hadoop version here: http://spark-notebook.io/
Zeppelin (v0.5) looks very promising but doesn't offer as much functionality as Spark Notebook or IPython with Spark right now.
FYI, you can run Scala, PySpark, SparkR, and SQL with Spark running on top of Jupyter via https://github.com/ibm-et/spark-kernel now. The new interpreters were added (and marked experimental) from pull request https://github.com/ibm-et/spark-kernel/pull/146.
See the language support wiki page for more information.
I have Jupyter installed, and indeed It is simpler than you think:
Install jupyter typing the next line in your terminal Click me for more info.
ilovejobs@mymac:~$ conda install jupyter
Update jupyter just in case.
ilovejobs@mymac:~$ conda update jupyter
Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.
ilovejobs@mymac:~$ cd Downloads
ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
Create an Apps
folder on your home (i.e):
ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
Move the uncompressed folder spark-1.5.1
to the ~/Apps
directory.
ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
Move to the ~/Apps
directory and verify that spark is there.
ilovejobs@mymac:~/Downloads$ cd ~/Apps
ilovejobs@mymac:~/Apps$ ls -l
drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
Here is the first tricky part. Add the spark binaries to your $PATH
:
ilovejobs@mymac:~/Apps$ cd
ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile
Here is the second tricky part. Add this environment variables also:
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
Source the profile to make these variables available for this terminal
ilovejobs@mymac:~$ source .profile
Create a ~/notebooks
directory.
ilovejobs@mymac:~$ mkdir notebooks
Move to ~/notebooks
and run pyspark:
ilovejobs@mymac:~$ cd notebooks
ilovejobs@mymac:~/notebooks$ pyspark
Notice that you can add those variables to the .bashrc
located in your home.
Now be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)
First, make sure you have got a spark enviornment in your machine.
Then, install a python module findspark
via pip:
$ sudo pip install findspark
And then in the python shell:
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext(appName="myAppName")
Now you can do what you want with pyspark in the python shell(or in ipython).
Actually it's the easiest way in my view to use spark kernel in the jupyter