How to achieve distributed processing and high availability simultaneously in Kafka?

情到浓时终转凉″ 提交于 2019-12-23 16:54:52

问题


I have a topic consisting of n partitions. To have distributed processing I create two processes running on different machines. They subscribe to the topic with same groupd id and allocate n/2 threads, each of which processes single stream(n/2 partitions per process).

With this I will have achieved load distribution, but now if process 1 crashes, than process 2 cannot consume messages from partitions allocated to process 1, as it listened only on n/2 streams at the start.

Or else, if I configure for HA and start n threads/streams on both processes, then when one node fails, all partitions will be processed by other node. But here, we have compromised distribution, as all partitions will be processed by a single node at a time.

Is there a way to achieve both simultaneously and how?


回答1:


Yes, use an existing stream processing engine. Storm is a good choice, as are Spark and Samza, depends on your use case.

Now you could roll your own, but as you've already discovered, managing failing processes and high availability is tricky. Generally speaking, distributed processing is filled with lots of subtle problems that someone else has already solved. In your shoes I'd use existing software to deal with that problem.



来源:https://stackoverflow.com/questions/30060261/how-to-achieve-distributed-processing-and-high-availability-simultaneously-in-ka

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!