I am using spark over emr and writing a pyspark script, I am getting an error when trying to
from pyspark import SparkContext
sc = SparkContext()
>
I just had a fresh pyspark installation on my Windows device and was having the exact same issue. What seems to have helped is the following:
Go to your System Environment Variables and add PYTHONPATH to it with the following value: %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-
, just check what py4j version you have in your spark/python/lib folder.
The reason why I think this works is because when I installed pyspark using conda, it also downloaded a py4j version which may not be compatible with the specific version of spark, so it seems to package its own version.