How can I run create Dataproc cluster, run job, delete cluster from Cloud Function

后端未结

关注

 2  1697

日久生厌 2021-01-14 16:17

I would like to start a Dataproc job in response to log files arriving in GCS bucket. I also do not want to keep a persistent cluster running as new log files arrive only se

2条回答

轻奢々 (楼主)

2021-01-14 17:03
You can put below GCLOUD commands from a shell script or Docker RUN command to:
1. Provision a Dataproc cluster
2. Execute a Spark Job
3. Delete the Dataproc Cluster (note the --quite or -q option to delete)
  
  Provision Dataproc Cluster: (takes 5+ mins)
  
  gcloud dataproc clusters create devops-poc-dataproc-cluster --subnet default --zone us-central1-a --master-machine-type n1-standard-1 --master-boot-disk-size 200 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 200 --image-version 1.3-deb9 --project gcp-project-212501 --service-account=service-id1@gcp-project-212501.iam.gserviceaccount.com
  
  Submit the Spark job:
  
  sleep 60 && gcloud dataproc jobs submit pyspark /dev_app/spark_poc/wordCountSpark.py --cluster=devops-poc-dataproc-cluster -- gs://gcp-project-212501-docker_bucket/input/ gs://gcp-project-212501-docker_bucket/output/
  
  DELETE Dataproc Cluster:
  
  gcloud dataproc clusters delete -q devops-poc-dataproc-cluster
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

How can I run create Dataproc cluster, run job, delete cluster from Cloud Function

Provision Dataproc Cluster: (takes 5+ mins)

Submit the Spark job:

DELETE Dataproc Cluster: