问题
I have a Kafka
based Spark Streaming
application that runs every 5 mins. Looking at the statistics after 5 days of run, there are a few observations:
The
Processing time
gradually increases from 30 secs to 50 secs. The snapshot is shown below which highlights the processing time chart:A good number of
Garbage collection
logs are appearing as shown below:
Questions:
- Is there a good explanation why the
Processing Time
has increased substantially, even when number of events are more or less same (during the last trough) ? - I am getting almost 70
GC logs
at the end of each processing cycle. It is normal? - Is the a better strategy to ensure the
processing time
to remain with in acceptable delays?
回答1:
It really depends on the application. The way I'd approach when debugging this issue is the following:
- Under Storage tab see whether the stored sizes are not growing. If there's a growth this can indicate some kind of cached resources leak. Check what's the value of
spark.cleaner.ttl
, but better make sure you uncache all the resources when they are not needed anymore. - Inspect DAG visualization of running jobs, and see whether the lineage is not growing. If this is the case, make sure to perform checkpointing to cut the lineage.
- Reduce the number of retained batches in UI (
spark.streaming.ui.retainedBatches
parameter). - Even the number of events is the same, please see whether the amount of data processed by tasks doesn't grow with time (Stages tab -> Input column). This could point to an application level issue.
I've had relatively complex Spark Streaming applications (Spark v1.6, v2.1.1, v2.2.0) running for days without any degradation in performance, so there must be some solvable issue.
来源:https://stackoverflow.com/questions/35693211/spark-streaming-application-health