Amazon EMR Pyspark Module not found

后端 未结 3 1217
故里飘歌
故里飘歌 2021-02-20 04:32

I created an Amazon EMR cluster with Spark already on it. When I run pyspark from the terminal it goes into the pyspark terminal when I ssh into my cluster.

I uploaded a

3条回答
  •  遥遥无期
    2021-02-20 05:17

    I add the following lines to ~/.bashrc for emr 4.3:

    export SPARK_HOME=/usr/lib/spark
    export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.XXX-src.zip:$PYTHONPATH
    export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
    

    Here py4j-0.XXX-src.zip is the py4j file in your spark python library folder. Search /usr/lib/spark/python/lib/ to find the exact version and replace the XXX with that version number.

    Run source ~/.bashrc and you should be good.

提交回复
热议问题