KeyError: 'SPARK_HOME' in pyspark on Jupyter on Google-Cloud-DataProc

前端 未结 1 1467
南方客
南方客 2021-01-21 13:39

When trying to show a SparkDF (Test), I get a KeyError, as shown below. Probably something goes wrong in the function I used before Test.show(3).

The KeyErr

1条回答
  •  温柔的废话
    2021-01-21 14:12

    You can simply put the following in an initialization action:

    #!/bin/bash
    
    cat << EOF | tee -a /etc/profile.d/custom_env.sh /etc/*bashrc >/dev/null
    export SPARK_HOME=/usr/lib/spark/
    EOF
    

    You'll want to put that init action before your jupyter installation action to make sure that it's present when the jupyter process starts up.

    Edit: To specify the two init actions, you can list them in a comma-separated list without spaces, like this:

    gcloud dataproc clusters create \
        --initialization-actions gs://mybucket/spark_home.sh,gs://mybucket/jupyter.sh ...
    

    0 讨论(0)
提交回复
热议问题