问题
I have a Kafka based Spark Streaming application that runs every 5 mins. Looking at the statistics after 5 days of run, there are a few observations:
The
Processing timegradually increases from 30 secs to 50 secs. The snapshot is shown below which highlights the processing time chart:A good number of
Garbage collectionlogs are appearing as shown below:
Questions:
- Is there a good explanation why the
Processing Timehas increased substantially, even when number of events are more or less same (during the last trough) ? - I am getting almost 70
GC logsat the end of each processing cycle. It is normal? - Is the a better strategy to ensure the
processing timeto remain with in acceptable delays?
回答1:
It really depends on the application. The way I'd approach when debugging this issue is the following:
- Under Storage tab see whether the stored sizes are not growing. If there's a growth this can indicate some kind of cached resources leak. Check what's the value of
spark.cleaner.ttl, but better make sure you uncache all the resources when they are not needed anymore. - Inspect DAG visualization of running jobs, and see whether the lineage is not growing. If this is the case, make sure to perform checkpointing to cut the lineage.
- Reduce the number of retained batches in UI (
spark.streaming.ui.retainedBatchesparameter). - Even the number of events is the same, please see whether the amount of data processed by tasks doesn't grow with time (Stages tab -> Input column). This could point to an application level issue.
I've had relatively complex Spark Streaming applications (Spark v1.6, v2.1.1, v2.2.0) running for days without any degradation in performance, so there must be some solvable issue.
来源:https://stackoverflow.com/questions/35693211/spark-streaming-application-health