问题
i am trying to run a spark script on EMR with custom conda env,
1. created a booststap for conda setup and supplied to the EMR, i don't see any issues with
bootstrap but when i do spark-submit it gives me same error no sure what am i missing
Traceback (most recent call last):
File "/mnt/tmp/spark-b334133c-d22d-42d4-beba-b85fffbbc9c7/iris_cube_analysis.py", line 3, in <module>
import iris
ImportError: No module named iris
spark-submit -
spark-submit --deploy-mode client --master yarn --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/mnt1/anaconda3/bin/python3 --conf spark.executorEnv.PYSPARK_PYTHON=/mnt1/anaconda3/bin/python3 s3://<name>/python/analysis.py s3://<input> s3://<output>
bootstap -
#!/bin/bash
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b -p /mnt1/anaconda3
export PATH=/mnt1/anaconda3/bin:$PATH
echo "export PATH="/mnt1/anaconda3/bin:$PATH"" >> ~/.bash_profile
sudo sed -i -e '$a\export PYSPARK_PYTHON=/mnt1/anaconda3/bin/python' /etc/spark/conf/spark-env.sh
echo "export PYSPARK_PYTHON="/mnt1/anaconda3/bin/python3"" >> ~/.bash_profile
conda install -y -c conda-forge iris
来源:https://stackoverflow.com/questions/61840277/pyspark-emr-conda-issue