发表新帖

发表新帖

How to restart Spark Streaming job from checkpoint on Dataproc?

后端未结

关注

 1  1917

花落未央 2021-01-20 22:30

This is a follow up to Spark streaming on dataproc throws FileNotFoundException

Over the past few weeks (not sure since exactly when), restart of a spark streaming j

1条回答

时光说笑 (楼主)

2021-01-20 23:10

We've recently added auto-restart capabilities to dataproc jobs (available in gcloud beta track and in v1 API).

To take advantage of auto-restart, a job must be able to recover/cleanup so it will not work for most jobs without modification. However, it does work out of the box with Spark streaming with checkpoint files.

The restart-dataproc-agent trick should no longer be necessary. Auto-restart is resilient against Job crashes, Dataproc Agent failures, and VM restart-on-migration events.

Example: gcloud beta dataproc jobs submit spark ... --max-failures-per-hour 1

See: https://cloud.google.com/dataproc/docs/concepts/restartable-jobs

If you want to test out recovery, you can simulate VM migration by restarting the master VM [1]. After this you should be able to describe the job [2] and see ATTEMPT_FAILURE entry in statusHistory.

[1] gcloud compute instances reset -m

[2] gcloud dataproc jobs describe

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题