spark streaming checkpoint recovery is very very slow

后端 未结 3 852
隐瞒了意图╮
隐瞒了意图╮ 2021-02-12 14:26
  • Goal: Read from Kinesis and store data in to S3 in Parquet format via spark streaming.
  • Situation: Application runs fine initially, running batches of 1hour and th
3条回答
  •  一向
    一向 (楼主)
    2021-02-12 14:43

    When a failed driver is restart, the following occurs:

    1. Recover computation – The checkpointed information is used to restart the driver, reconstruct the contexts and restart all the receivers.
    2. Recover block metadata – The metadata of all the blocks that will be necessary to continue the processing will be recovered.
    3. Re-generate incomplete jobs – For the batches with processing that has not completed due to the failure, the RDDs and corresponding jobs are regenerated using the recovered block metadata.
    4. Read the block saved in the logs – When those jobs are executed, the block data is read directly from the write ahead logs. This recovers all the necessary data that were reliably saved to the logs.
    5. Resend unacknowledged data – The buffered data that was not saved to the log at the time of failure will be sent again by the source. as it had not been acknowledged by the receiver.

    Since all these steps are performed at driver your batch of 0 events take so much time. This should happen with the first batch only then things will be normal.

    Reference here.

提交回复
热议问题