Spark Streaming: Application health

北城以北 提交于 2019-12-12 11:01:14

问题


I have a Kafka based Spark Streaming application that runs every 5 mins. Looking at the statistics after 5 days of run, there are a few observations:

  1. The Processing time gradually increases from 30 secs to 50 secs. The snapshot is shown below which highlights the processing time chart:

  2. A good number of Garbage collection logs are appearing as shown below:

Questions:

  1. Is there a good explanation why the Processing Time has increased substantially, even when number of events are more or less same (during the last trough) ?
  2. I am getting almost 70 GC logs at the end of each processing cycle. It is normal?
  3. Is the a better strategy to ensure the processing time to remain with in acceptable delays?

回答1:


It really depends on the application. The way I'd approach when debugging this issue is the following:

  1. Under Storage tab see whether the stored sizes are not growing. If there's a growth this can indicate some kind of cached resources leak. Check what's the value of spark.cleaner.ttl, but better make sure you uncache all the resources when they are not needed anymore.
  2. Inspect DAG visualization of running jobs, and see whether the lineage is not growing. If this is the case, make sure to perform checkpointing to cut the lineage.
  3. Reduce the number of retained batches in UI (spark.streaming.ui.retainedBatches parameter).
  4. Even the number of events is the same, please see whether the amount of data processed by tasks doesn't grow with time (Stages tab -> Input column). This could point to an application level issue.

I've had relatively complex Spark Streaming applications (Spark v1.6, v2.1.1, v2.2.0) running for days without any degradation in performance, so there must be some solvable issue.



来源:https://stackoverflow.com/questions/35693211/spark-streaming-application-health

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!