发表新帖

发表新帖

spark streaming checkpoint recovery is very very slow

后端未结

关注

 3  852

隐瞒了意图╮ 2021-02-12 14:26

Goal: Read from Kinesis and store data in to S3 in Parquet format via spark streaming.
Situation: Application runs fine initially, running batches of 1hour and th

3条回答

一向 (楼主)

2021-02-12 14:43
When a failed driver is restart, the following occurs:
1. Recover computation – The checkpointed information is used to restart the driver, reconstruct the contexts and restart all the receivers.
2. Recover block metadata – The metadata of all the blocks that will be necessary to continue the processing will be recovered.
3. Re-generate incomplete jobs – For the batches with processing that has not completed due to the failure, the RDDs and corresponding jobs are regenerated using the recovered block metadata.
4. Read the block saved in the logs – When those jobs are executed, the block data is read directly from the write ahead logs. This recovers all the necessary data that were reliably saved to the logs.
5. Resend unacknowledged data – The buffered data that was not saved to the log at the time of failure will be sent again by the source. as it had not been acknowledged by the receiver.
Since all these steps are performed at driver your batch of 0 events take so much time. This should happen with the first batch only then things will be normal.

Reference here.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题