How to specify the version of Python for spark-submit to use?

前端 未结 4 1068
南笙
南笙 2021-02-06 23:45

I have two versions of Python. When I launch a spark application using spark-submit, the application uses the default version of Python. But, I want to use the other one. How to

相关标签:
4条回答
  • 2021-02-06 23:54

    You can set the PYSPARK_PYTHON variable in conf/spark-env.sh (in Spark's installation directory) to the absolute path of the desired Python executable.

    Spark distribution contains spark-env.sh.template (spark-env.cmd.template on Windows) by default. It must be renamed to spark-env.sh (spark-env.cmd) first.

    For example, if Python executable is installed under /opt/anaconda3/bin/python3:

    PYSPARK_PYTHON='/opt/anaconda3/bin/python3'
    

    Check out the configuration documentation for more information.

    0 讨论(0)
  • 2021-02-06 23:55

    You can either specify the version of Python by listing the path to your install in a shebang line in your script:

    myfile.py:

    #!/full/path/to/specific/python2.7
    

    or by calling it on the command line without a shebang line in your script:

    /full/path/to/specific/python2.7 myfile.py
    

    However, I'd recommend looking into Python's excellent virtual environments that will allow you to create separate "environments" for each version of Python. Virtual environments more or less work by handling all the path specification after you activate them, alllowing you to just type python myfile.py without worrying about conflicting dependencies or knowing the full path to a specific version of python.

    Click here for an excellent guide to getting started with Virtual Environments or [here] for the Python3 official documentation.

    If you do not have access to the nodes and you're running this using PySpark, you can specify the Python version in your spark-env.sh:

    Spark_Install_Dir/conf/spark-env.sh:

    PYSPARK_PYTHON = /full/path/to/python_executable/eg/python2.7
    
    0 讨论(0)
  • 2021-02-07 00:02

    If you want to specify the option PYSPARK_MAJOR_PYTHON_VERSION in spark-submit command line, you should check this:

    http://spark.apache.org/docs/latest/running-on-kubernetes.html

    You can search spark.kubernetes.pyspark.pythonVersion in this page and you'll find following content:

    spark.kubernetes.pyspark.pythonVersion  "2" This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3.
    

    Now, your command should looks like :

    spark-submit --conf spark.kubernetes.pyspark.pythonVersion=3 ...
    

    It should work.

    0 讨论(0)
  • 2021-02-07 00:15

    In my environment I simply used

    export PYSPARK_PYTHON=python2.7
    

    It worked for me

    0 讨论(0)
提交回复
热议问题