AWS Batch analog in GCP?

╄→尐↘猪︶ㄣ 提交于 2021-02-07 07:24:33

问题


I was using AWS and am new to GCP. One feature I used heavily was AWS Batch, which automatically creates a VM when the job is submitted and deletes the VM when the job is done. Is there a GCP counterpart? Based on my research, the closest is GCP Dataflow. The GCP Dataflow documentation led me to Apache Beam. But when I walk through the examples here (link), it feels totally different from AWS Batch.

Any suggestions on submitting jobs for batch processing in GCP? My requirement is to simply retrieve data from Google Cloud Storage, analyze the data using a Python script, and then put the result back to Google Cloud Storage. The process can take overnight and I don't want the VM to be idle when the job is finished but I'm sleeping.


回答1:


I recommend checking out dsub. It's an open-source tool initially developed by the Google Genomics teams for doing batch processing on Google Cloud.




回答2:


You can do this using AI Platform Jobs which is now able to run arbitrary docker images:

gcloud ai-platform jobs submit training $JOB_NAME \
       --scale-tier BASIC \ 
       --region $REGION \ 
       --master-image-uri gcr.io/$PROJECT_ID/some-image

You can define the master instance type and even additional worker instances if desired. They should consider creating a sibling product without the AI buzzword so people can find this functionality easier.




回答3:


Officially, according to the "Map AWS services to Google Cloud Platform products" page, there is no direct equivalent but you can put a few things together that might get you to get close.

I wasn't sure if you were or had the option to run your python code in Docker. Then the Kubernetes controls might do the trick. From the GCP docs:

Note: Beginning with Kubernetes version 1.7, you can specify a minimum size of zero for your node pool. This allows your node pool to scale down completely if the instances within aren't required to run your workloads. However, while a node pool can scale to a zero size, the overall cluster size does not scale down to zero nodes (as at least one node is always required to run system Pods).

So, if you are running other managed instances anyway you can scale up or down to and from 0 but you have the Kubernetes node is still active and running the pods.

I'm guessing you are already using something like "Creating API Requests and Handling Responses" to get an ID you can verify that the process is started, instance created, and the payload is processing. You can use that same process to submit that the process completes as well. That takes care of the instance creation and launch of the python script.

You could use Cloud Pub/Sub. That can help you keep track of the state of that: can you modify your python to notify the completion of the task? When you create the task and launch the instance, you can also report that the python job is complete and then kick off an instance tear down process.

Another thing you can do to drop costs is to use Preemptible VM Instances so that the instances run at 1/2 cost and will run a maximum of 1 day anyway.

Hope that helps.




回答4:


The Product that best suits your use-case in GCP is Cloud Task. We are using it for a similar use-case where we are retrieving files from another HTTP server and after some processing storing them in Google Cloud Storage.

This GCP documentation describes in full detail the steps to create tasks and using them.

You schedule your task programmatically in Cloud Tasks and you have to create task handlers(worker services) in the App Engine. Some limitation For worker services running in App Engine

  • the standard environment:

    • Automatic scaling: task processing must finish in 10 minutes.
    • Manual and basic scaling: requests can run up to 24 hours.
  • the flex environment: all types have a 60 minutes timeout.




回答5:


I think the Cron job can help you in this regard and you can implement it with the help of App engine, Pub/sub and Compute engine. Reliable Task Scheduling on Google Compute Engine In distributed systems, such as a network of Google Compute Engine instances, it is challenging to reliably schedule tasks because any individual instance may become unavailable due to autoscaling or network partitioning.

Google App Engine provides a Cron service. Using this service for scheduling and Google Cloud Pub/Sub for distributed messaging, you can build an application to reliably schedule tasks across a fleet of Compute Engine instances.

For a detailed look you can check it here: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine



来源:https://stackoverflow.com/questions/51215904/aws-batch-analog-in-gcp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!