How can I run create Dataproc cluster, run job, delete cluster from Cloud Function

后端 未结 2 1691
日久生厌
日久生厌 2021-01-14 16:17

I would like to start a Dataproc job in response to log files arriving in GCS bucket. I also do not want to keep a persistent cluster running as new log files arrive only se

2条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-14 17:03

    You can put below GCLOUD commands from a shell script or Docker RUN command to:

    1. Provision a Dataproc cluster
    2. Execute a Spark Job
    3. Delete the Dataproc Cluster (note the --quite or -q option to delete)

      Provision Dataproc Cluster: (takes 5+ mins)

      gcloud dataproc clusters create devops-poc-dataproc-cluster --subnet default --zone us-central1-a --master-machine-type n1-standard-1 --master-boot-disk-size 200 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 200 --image-version 1.3-deb9 --project gcp-project-212501 --service-account=service-id1@gcp-project-212501.iam.gserviceaccount.com

      Submit the Spark job:

      sleep 60 && gcloud dataproc jobs submit pyspark /dev_app/spark_poc/wordCountSpark.py --cluster=devops-poc-dataproc-cluster -- gs://gcp-project-212501-docker_bucket/input/ gs://gcp-project-212501-docker_bucket/output/

      DELETE Dataproc Cluster:

      gcloud dataproc clusters delete -q devops-poc-dataproc-cluster

提交回复
热议问题