Pyspark EMR Conda issue

梦想与她 提交于 2021-02-11 14:46:28

问题


i am trying to run a spark script on EMR with custom conda env,
1. created a booststap for conda setup and supplied to the EMR, i don't see any issues with bootstrap but when i do spark-submit it gives me same error no sure what am i missing

     Traceback (most recent call last):
  File "/mnt/tmp/spark-b334133c-d22d-42d4-beba-b85fffbbc9c7/iris_cube_analysis.py", line 3, in <module>
    import iris
ImportError: No module named iris

spark-submit -

spark-submit --deploy-mode client --master yarn --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/mnt1/anaconda3/bin/python3 --conf spark.executorEnv.PYSPARK_PYTHON=/mnt1/anaconda3/bin/python3 s3://<name>/python/analysis.py s3://<input> s3://<output>

bootstap -

#!/bin/bash
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b -p /mnt1/anaconda3
export PATH=/mnt1/anaconda3/bin:$PATH
echo "export PATH="/mnt1/anaconda3/bin:$PATH"" >> ~/.bash_profile
sudo sed -i -e '$a\export PYSPARK_PYTHON=/mnt1/anaconda3/bin/python' /etc/spark/conf/spark-env.sh
echo "export PYSPARK_PYTHON="/mnt1/anaconda3/bin/python3"" >> ~/.bash_profile
conda install -y -c conda-forge iris

来源:https://stackoverflow.com/questions/61840277/pyspark-emr-conda-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!