I am running a local pyspark code from command line and it works:
/Users/edamame/local-lib/apache-spark/spark-1.5.1/bin/pyspark --jars myJar.jar --driver-cla
Considering the following prerequisites:
Here is what you'll need to do:
From Eclipse ID, Check that you are on the PyDev perspective:
From the Preferences window, go to PyDev > Interpreters > Python Interpreter:
I also recommend you to handle your own log4j.properties
file in each of your project.
To do so, you'll need to add the environment variable SPARK_CONF_DIR
as done previously, example:
Name: SPARK_CONF_DIR, Value: ${project_loc}/conf
If you experience some problems with the variable ${project_loc} (e.g: with Linux), specify an absolute path instead.
Or if you want to keep ${project_loc}
, right-click on every Python source and: Runs As > Run Configuration, then create your SPARK_CONF_DIR
variable in the Environment tab as described previously.
Occasionally, you can add other environment variables such as TERM
, SPARK_LOCAL_IP
and so on:
PS: I don't remember the sources of this tutorial, so excuse me for not citing the author. I didn't come up with this by myself.