pyspark error does not exist in the jvm error when initializing SparkContext

后端 未结 10 1592
一向
一向 2021-01-07 22:32

I am using spark over emr and writing a pyspark script, I am getting an error when trying to

from pyspark import SparkContext
sc = SparkContext()


        
相关标签:
10条回答
  • 2021-01-07 22:44

    Try adding this at the top of the file:

    import findspark
    findspark.init()
    

    See https://github.com/minrk/findspark

    0 讨论(0)
  • 2021-01-07 22:45

    Try to install spark 2.4.5 version, and set spark home path to this version. Even I faced the issue after changing the version, it got resolved for me.

    0 讨论(0)
  • 2021-01-07 22:46

    PySpark recently released 2.4.0, but there's no stable release for spark coinciding with this new version. Try downgrading to pyspark 2.3.2, this fixed it for me

    Edit: to be more clear your PySpark version needs to be the same as the Apache Spark version that is downloaded, or you may run into compatibility issues

    Check the version of pyspark by using

    pip freeze

    0 讨论(0)
  • 2021-01-07 22:47

    The following steps solved my issue: - Downgrading it to 2.3.2 - adding PYTHONPATH as System Environment Variable with value %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH% Note: use proper version in the value given above, don't copy exactly.

    0 讨论(0)
  • 2021-01-07 22:48

    Instead of editing the Environment Variables, you might just ensure that the Python environment (the one with pyspark) also has the same py4j version as the zip file present in the \python\lib\ dictionary within you Spark folder. E.g., d:\Programs\Spark\python\lib\py4j-0.10.7-src.zip on my system, for Spark 2.3.2. It's the py4j version shipped as part of the Spark archive file.

    0 讨论(0)
  • 2021-01-07 22:49

    You need to set the following environments to set the Spark path and the Py4j path.
    For example in ~/.bashrc:

    export SPARK_HOME=/home/hadoop/spark-2.1.0-bin-hadoop2.7
    export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
    export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH
    

    And use findspark at the top of the your file:

    import findspark
    findspark.init()
    
    0 讨论(0)
提交回复
热议问题