When trying to show a SparkDF (Test), I get a KeyError, as shown below. Probably something goes wrong in the function I used before Test.show(3)
.
The KeyErr
You can simply put the following in an initialization action:
#!/bin/bash
cat << EOF | tee -a /etc/profile.d/custom_env.sh /etc/*bashrc >/dev/null
export SPARK_HOME=/usr/lib/spark/
EOF
You'll want to put that init action before your jupyter installation action to make sure that it's present when the jupyter process starts up.
Edit: To specify the two init actions, you can list them in a comma-separated list without spaces, like this:
gcloud dataproc clusters create \
--initialization-actions gs://mybucket/spark_home.sh,gs://mybucket/jupyter.sh ...