What is the difference between mini-batch vs real time streaming in practice (not theory)? In theory, I understand mini batch is something that batches in the given time frame w
This is something I think a lot about, because the answer to technical and non-technical people is always hard to formulate.
I will try to answer to this part:
Why is it not effective to run mini-batch with 1 millisecond latency?
I believe the problem is not on the model itself but on how Spark implements it. It is empirical evidence that reducing the mini-batch window too much, performances degrade. In fact there was a suggested time of at least 0.5 seconds or more to prevent this kind of degradation. On big volumes even this window size was too small. I never had the chance to test it in production but I never had a strong real-time requirement.
I know Flink better than Spark so I don't really know about its internals that well but I believe the overhead introduced in the designing of the batch process were irrelevant if your batch takes at least a few seconds to be processed but becomes heavy if they introduce a fixed latency and you can't go below that. To understand the nature of these overheads I think you will have to dig in the Spark documentation, code and open issues.
The industry right now acknowledged that there is a need for a different model and that's why many "streaming-first" engines are growing right now, with Flink as the front runner. I don't think it's just buzzwords and hype, because the use cases for this kind of technology, at least for now, are extremely limited. Basically if you need to take an automatized decision in real time on big, complex data, you need a real-time fast data engine. In any other case, including near-real-time, real-time streaming is an overkill and mini-batch is fine.