What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame w
I know that one answer was accepted, but I think one more must be said to answer this question fully. I think answer like "Flink's real time is faster/better for streaming" is wrong, because it heavily depends what you want to do.
Spark mini-batch model has - as it was written in previous answer - disadvantage, that for each mini-batch there must be new job created.
However, Spark Structured Streaming has default processing time trigger is set to 0, that means reading new data is done as fast as possible. It means that:
Latency is very small in such cases.
One big advantage over Flink is that Spark has unified APIs for batch and streaming processing, because of this mini-batch model. You can easily translate batch job to streaming job, join streaming data with old data from batch. Doing it with Flink is not possible. Flink also doesn't allow you to do interactive queries with data you've received.
As said before, use cases are different for micro-batches and real-time streaming:
For more details about Structured Streaming please look at this blog post