Kafka: Single consumer group in multiple instances

醉酒当歌 提交于 2020-11-30 12:44:20

问题


I am working on implementing a Kafka based solution to our application. As per the Kafka documentation, what i understand is one consumer in a consumer group (which is a thread) is internally mapped to one partition in the subscribed topic.

Let's say i have a topic with 40 partitions and i have a high level consumer running in 4 instances. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.

  • Should i go for same consumer group with 10 threads per instance? - Stackoverflow says same consumer group between the instances act as traditional synchronous queue mechanism

In Apache Kafka why can't there be more consumer instances than partitions?

  • Or Should i go for different consumer group per instance?

Using simple consumer or low level consumer gives control over the partition but then if one instance goes down, the other three instances would not process the messages from the partitions consumed in first instance


回答1:


First to explain the concept of Consumers & Consumer Groups,

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

The records will be effectively load balanced over the consumer instances in a consumer group. If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.

Now to answer your questions,

1. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.

This is possible by default in Kafka architecture. You just have to label all the 4 instances with the same consumer group name.

2. Should i go for same consumer group with 10 threads per instance ?

Doing this will assign each thread a kafka partition from which it will consume data, which is optimal. Reducing the number of threads will load balance the record distribution among the consumer instances and MAY overload some of the consumer instances.

3. In Apache Kafka why can't there be more consumer instances than partitions?

In Kafka, a partition can be assigned only to one consumer instance. Thus, creating more consumer instances than partitions will lead to idle consumers who will not be consuming any records from kafka.

4. Should i go for different consumer group per instance?

No. This will lead to duplication of the records, as every record will be sent to all the instances, as they are from different consumer groups.

Hope this clarifies your doubts.




回答2:


There are few things to note when designing your Kafka echo system:

  1. Consumer is essentially a thread and you do not want multiple thread trying to change your offset mark. That's why the consumer system should be designed as one consumer one thread.

  2. Offset commits, there a delicate balance between how frequently you want to perform offset commits. If the frequency is higher then it will have an adverse effect on performance of your system (Zk will be the bottleneck). If the frequency is two low then you may risk duplicate messages.




回答3:


In Kafka you have both ways to do competing-consumers and publish-subscribe patterns:

  • competing consumers : it's possible putting consumers inside the same consumer group. So that each partition is accessible by only one consumer (of course a consumer can read more than one partition). It means that you can't have more consumers than partitions in a consumer group, because the other consumers will be idle without being assigned any partition. Of course if one consumer in the consumer group goes down, one of the idle consumer will take the partition.
  • publish subscribe : if you have different consumer groups, all consumers in different consumer groups will receive same messages. Inside the consumer group then, the above pattern will be applied.


来源:https://stackoverflow.com/questions/44587416/kafka-single-consumer-group-in-multiple-instances

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!