发表新帖

发表新帖

spark streaming checkpoint recovery is very very slow

后端未结

关注

 3  851

隐瞒了意图╮ 2021-02-12 14:26

Goal: Read from Kinesis and store data in to S3 in Parquet format via spark streaming.
Situation: Application runs fine initially, running batches of 1hour and th

3条回答

深忆病人 (楼主)

2021-02-12 14:29

raised a Jira issue : https://issues.apache.org/jira/browse/SPARK-19304

The issue is because we read more data per iteration than what is required and then discard the data. This can be avoided by adding a limit to getResults aws call.

Fix: https://github.com/apache/spark/pull/16842

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题