问题
I have anaconda installed and also I have downloaded Spark 1.6.2. I am using the following instructions from this answer to configure spark for Jupyter enter link description here
I have downloaded and unzipped the spark directory as
~/spark
Now when I cd into this directory and into bin I see the following
SFOM00618927A:spark $ cd bin
SFOM00618927A:bin $ ls
beeline pyspark run-example.cmd spark-class2.cmd spark-sql sparkR
beeline.cmd pyspark.cmd run-example2.cmd spark-shell spark-submit sparkR.cmd
load-spark-env.cmd pyspark2.cmd spark-class spark-shell.cmd spark-submit.cmd sparkR2.cmd
load-spark-env.sh run-example spark-class.cmd spark-shell2.cmd spark-submit2.cmd
I have also added the environment variables as mentioned in the above answer to my .bash_profile and .profile
Now in the spark/bin directory first thing I want to check is if pyspark command works on shell first.
So I do this after doing cd spark/bin
SFOM00618927A:bin $ pyspark
-bash: pyspark: command not found
As per the answer after following all the steps I can just do
pyspark
in terminal in any directory and it should start a jupyter notebook with spark engine. But even the pyspark within the shell is not working forget about making it run on juypter notebook
Please advise what is going wrong here.
Edit:
I did
open .profile
at home directory and this is what is stored in the path.
export PATH=/Users/854319/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Users/854319/spark/bin
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark
回答1:
1- You need to set JAVA_HOME
and spark paths for the shell to find them. After setting them in your .profile
you may want to
source ~/.profile
to activate the setting in the current session. From your comment I can see you're already having the JAVA_HOME
issue.
Note if you have .bash_profile
or .bash_login
, .profile
will not work as described here
2- When you are in spark/bin
you need to run
./pyspark
to tell the shell that the target is in the current folder.
回答2:
Here's my environment vars, hope it will help you:
# path to JAVA_HOME
export JAVA_HOME=$(/usr/libexec/java_home)
#Spark
export SPARK_HOME="/usr/local/spark" #version 1.6
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_SUBMIT_ARGS="--master local[2]"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
^^ Remove the Pyspark_driver_python_opts option if you don't want the notebook to launch, otherwise you can leave this out entirely and use it on your command line when you need it.
I have anaconda vars in another line to append to the PATH.
回答3:
For anyone who came here during or after MacOS Catalina, make sure you're establishing/sourcing variables in zshrc and not bash.
$ nano ~/.zshrc
# Set Spark Path
export SPARK_HOME="YOUR_PATH/spark-3.0.1-bin-hadoop2.7"
export PATH="$SPARK_HOME/bin:$PATH"
# Set pyspark + jupyter commands
export PYSPARK_SUBMIT_ARGS="pyspark-shell"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='lab' pyspark
$ source ~/.zshrc
$ pyspark
# Automatically opens Jupyter Lab w/ PySpark initialized.
来源:https://stackoverflow.com/questions/38798816/pyspark-command-not-recognised