I am using Amazon EC2, and I have my master and development servers as one. And I have another instance for a single worker.
I am new to this, but I have managed to make
Since you already use Anaconda you can simply create an environment with the desired Python version:
conda create --name foo python=3.4
source activate foo
python --version
## Python 3.4.5 :: Continuum Analytics, Inc
and use it as PYSPARK_DRIVER_PYTHON
:
export PYSPARK_DRIVER_PYTHON=/path/to/anaconda/envs/foo/bin/python
I've had the same issue. There are several possible reasons:
One of your workers had python3.4 but you didn't notice it. Please go to each worker to check PYSPARK_PYTHON
. That is, if you set PYSPARK_PYTHON=python3
, go to each worker to type python3
and check the version.
You connect to the wrong workers. Check your SparkContext
config and make sure what your workers are.
I have spent more than ten hours to find the same issue with you. But the root cause for me is the wrong connection.... TAT