How to keep Google Dataproc master running?

后端 未结 1 1536
刺人心
刺人心 2021-01-26 18:55

I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every clu

1条回答
  •  抹茶落季
    2021-01-26 19:24

    As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:

    1. Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
    2. Edit the initialization action to set the environment variable as suggested by yelsayed:

      function run_datalab(){
        if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \
            -v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then
          echo 'Cloud Datalab Jupyter server successfully deployed.'
        else
          err 'Failed to run Cloud Datalab'
        fi
      }
      

    And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.

    0 讨论(0)
提交回复
热议问题