问题
I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every cluster I created. I see there is a similar question here: Keep running Dataproc Master node
It looks like it's the initialization action problem. However the post does not give me enough info to fix the issue. Below are the commands I used to create the cluster:
gcloud dataproc clusters create $CLUSTER_NAME \
--project $PROJECT \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--master-machine-type $MASTER_MACHINE_TYPE \
--master-boot-disk-size $MASTER_DISK_SIZE \
--worker-boot-disk-size $WORKER_DISK_SIZE \
--num-workers=$NUM_WORKERS \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--scopes cloud-platform \
--metadata JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
--optional-components=ANACONDA,JUPYTER \
--image-version=1.3
I need the BigQuery connector, GCS connector, Jupyter and DataLab for my cluster.
How can I keep my master node running? Thank you.
回答1:
As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:
- Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
Edit the initialization action to set the environment variable as suggested by yelsayed:
function run_datalab(){ if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \ -v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then echo 'Cloud Datalab Jupyter server successfully deployed.' else err 'Failed to run Cloud Datalab' fi }
And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.
来源:https://stackoverflow.com/questions/55583747/how-to-keep-google-dataproc-master-running