spark streaming checkpoint recovery is very very slow

后端 未结 3 850
隐瞒了意图╮
隐瞒了意图╮ 2021-02-12 14:26
  • Goal: Read from Kinesis and store data in to S3 in Parquet format via spark streaming.
  • Situation: Application runs fine initially, running batches of 1hour and th
3条回答
  •  别跟我提以往
    2021-02-12 14:39

    I had similar issues before, my application getting slower and slower.

    try to release memory after using rdd, call rdd.unpersist() https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#unpersist(boolean)

    or spark.streaming.backpressure.enabled to true

    http://spark.apache.org/docs/latest/streaming-programming-guide.html#setting-the-right-batch-interval

    http://spark.apache.org/docs/latest/streaming-programming-guide.html#requirements

    also, check your locality setting, maybe too much data move around.

提交回复
热议问题