How jobs are assigned to executors in Spark Streaming?

天涯浪子 提交于 2019-11-27 22:55:56

Actually, in the current implementation of Spark Streaming and under default configuration, only job is active (i.e. under execution) at any point of time. So if one batch's processing takes longer than 10 seconds, then then next batch's jobs will stay queued.

This can be changed with an experimental Spark property "spark.streaming.concurrentJobs" which is by default set to 1. Its not currently documented (maybe I should add it).

The reason it is set to 1 is that concurrent jobs can potentially lead to weird sharing of resources and which can make it hard to debug the whether there is sufficient resources in the system to process the ingested data fast enough. With only 1 job running at a time, it is easy to see that if batch processing time < batch interval, then the system will be stable. Granted that this may not be the most efficient use of resources under certain conditions. We definitely hope to improve this in the future.

There is a little bit of material regarding the internals of Spark Streaming in this meetup slides (sorry, about the shameless self advertising :) ). That may be useful to you.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!