I have a python code which have the following 3rd party dependencies:
import boto3
from warcio.archiveiterator import ArchiveIterator
from warcio.recordloade
Before doing spark-submit
try going to python shell
and try importing the modules.
Also check which python shell
(check python path) is opening up by default.
If you are able to successfully import these modules in python shell (same python version as you trying to use in spark-submit
), please check following:
In which mode are you submitting the application? try standalone
or if on yarn try client
mode.
Also try adding export PYSPARK_PYTHON=(your python path)
All checks mentioned above worked ok but setting PYSPARK_PYTHON solved the issue for me.