I am using spark over emr and writing a pyspark script, I am getting an error when trying to
from pyspark import SparkContext
sc = SparkContext()
Just to make it simple,It's all about python and java couldn't talk because the medium the have to speak out (py4j) are different, that's it.I had same issue and all those above answers are valid and will work if you use them correctly, It's either you define a system variable to tell both which py4j they should use, or you can make some un-installation and installation back so that everyone will be on same page.
Use SparkContext().stop() at the end of the program to stop this situation.
I just had a fresh pyspark installation on my Windows device and was having the exact same issue. What seems to have helped is the following:
Go to your System Environment Variables and add PYTHONPATH to it with the following value: %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%
, just check what py4j version you have in your spark/python/lib folder.
The reason why I think this works is because when I installed pyspark using conda, it also downloaded a py4j version which may not be compatible with the specific version of spark, so it seems to package its own version.
when i download new version pip install from anaconda command prompt, i get same issue.
when i use top of the code file:
import findspark
findspark.init("c:\spark")
this code solved my problem.