Google cloud dataproc failing to create new cluster with initialization scripts

问题

I am using the below command to create data proc cluster:

gcloud dataproc clusters create informetis-dev --initialization-actions “gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hue/hue.sh,gs://dataproc-initialization-actions/ipython-notebook/ipython.sh,gs://dataproc-initialization-actions/tez/tez.sh,gs://dataproc-initialization-actions/oozie/oozie.sh,gs://dataproc-initialization-actions/zeppelin/zeppelin.sh,gs://dataproc-initialization-actions/user-environment/user-environment.sh,gs://dataproc-initialization-actions/list-consistency-cache/shared-list-consistency-cache.sh,gs://dataproc-initialization-actions/kafka/kafka.sh,gs://dataproc-initialization-actions/ganglia/ganglia.sh,gs://dataproc-initialization-actions/flink/flink.sh” --image-version 1.1 --master-boot-disk-size 100GB --master-machine-type n1-standard-1 --metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance” --num-preemptible-workers 2 --num-workers 2 --preemptible-worker-boot-disk-size 1TB --properties hive:hive.metastore.warehouse.dir=gs://informetis-dev/hive-warehouse --worker-machine-type n1-standard-2 --zone asia-east1-b --bucket info-dev

But Dataproc failed to create cluster with following errors in failure file:

cat + mysql -u hive -phive-password -e '' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111) + mysql -e 'CREATE USER '\''hive'\'' IDENTIFIED BY '\''hive-password'\'';' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111)

Does anyone have any idea behind this failure ?

回答1:

It looks like you're missing the --scopes sql-admin flag as described in the initialization action's documentation, which will prevent the CloudSQL proxy from being able to authorize its tunnel into your CloudSQL instance.

Additionally, aside from just the scopes, you need to make sure the default Compute Engine service account has the right project-level permissions in whichever project holds your CloudSQL instance. Normally the default service account is a project editor in the GCE project, so that should be sufficient when combined with the sql-admin scopes to access a CloudSQL instance in the same project, but if you're accessing a CloudSQL instance in a separate project, you'll also have to add that service account as a project editor in the project which owns the CloudSQL instance.

You can find the email address of your default compute service account under the IAM page for your project deploying Dataproc clusters, with the name "Compute Engine default service account"; it should look something like <number>@project.gserviceaccount.com`.

回答2:

I am assuming that you already created the Cloud SQL instance with something like this, correct?

gcloud sql instances create g-test-1022 \
  --tier db-n1-standard-1 \
  --activation-policy=ALWAYS

If so, then it looks like the error is in how the argument for the metadata is formatted. You have this:

--metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”

Unfortuinately, the zone looks to be incomplete (asia-east1 instead of asia-east1-b).

Additionally, with running that many initializayion actions, you'll want to provide a pretty generous initialization action timeout so the cluster does not assume something has failed while your actions take awhile to install. You can do that by specifying: