What is the difference between mini-batch vs real time streaming in practice (not theory)?

前端未结

关注

 3  860

情歌与酒 2021-01-31 10:29

What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame w

3条回答

陌清茗 (楼主)

2021-01-31 11:13
Disclaimer: I'm a committer and PMC member of Apache Flink. I'm familiar with the overall design of Spark Streaming but do not know its internals in detail.

The mini-batch stream processing model as implemented by Spark Streaming works as follows:
- Records of a stream are collected in a buffer (mini-batch).
- Periodically, the collected records are processed using a regular Spark job. This means, for each mini-batch a complete distributed batch processing job is scheduled and executed.
- While the job runs, the records for the next batch are collected.
So, why is it not effective to run a mini-batch every 1ms? Simply because this would mean to schedule a distributed batch job every millisecond. Even though Spark is very fast in scheduling jobs, this would be a bit too much. It would also significantly reduce the possible throughput. Batching techniques used in OSs or TCP do also not work well if their batches become too small.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...