Kafka consumer - what's the relation of consumer processes and threads with topic partitions

后端 未结 3 1162
被撕碎了的回忆
被撕碎了的回忆 2021-02-02 03:48

I have been working with Kafka lately and have bit of confusion regarding the consumers under a consumer group. The center of the confusion is whether to implement consumers as

相关标签:
3条回答
  • 2021-02-02 04:31

    A consumer group can have multiple consumer instances running (multiple process with the same group-id). While consuming each partition is consumed by exactly one consumer instance in the group.

    E.g. if your topic contains 2 partitions and you start a consumer group group-A with 2 consumer instances then each one of them will be consuming messages from a particular partition of the topic.

    If you start the same 2 consumer with different group id group-A & group-B then the message from both partitions of the topic will be broadcast to each one of them. So in that case the consumer instance running under group-A will have messages from both the partitions of the topic, and same is true for group-B as well.

    Read more on this on their documentation

    EDIT : Based on your comment which says,

    I was wondering what is the effective difference between having 2 consumer threads under the same process as opposed to 2 consumer processes (group being the same in both cases)

    The consumer group-id is same/global across the cluster. Suppose you have started process-one with 2 threads and then spawn another process (may be in a different machine) with the same groupId having 2 more threads then kafka will add these 2 new threads to consume messages from the topic. So eventually there will be 4 threads responsible for consuming from the same topic. Kafka will then trigger a re-balance to re-assign partitions to threads, so it could happen that for a particular partition which was being consumed by thread T1 of process P1may be allocated to be consumed by thread T2 of process P2. The below few lines are taken from the wiki page

    When a new process is started with the same Consumer Group name, Kafka will add that processes' threads to the set of threads available to consume the Topic and trigger a 're-balance'. During this re-balance Kafka will assign available partitions to available threads, possibly moving a partition to another process. If you have a mixture of old and new business logic, it is possible that some messages go to the old logic.

    0 讨论(0)
  • 2021-02-02 04:31

    Thanks for the detailed answer from @user2720864, but I think the re-allocation case @user2720864 mentioned in the answer is not correct => one partition cannot be consumed by two consumers.

    When there are more consumers (compared to the partitions), each partition will be exclusively allocated to one consumer only while the leftover consumers will stay lazy only until some working consumers being dead or being removed from the group.

    Based on the Kafka Consumers document:

    The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. This process of maintaining membership in the group is handled by the Kafka protocol dynamically. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.

    And also its API specification at "Consumer Groups and Topic Subscriptions" section:

    This is achieved by balancing the partitions between all members in the consumer group so that each partition is assigned to exactly one consumer in the group.

    0 讨论(0)
  • 2021-02-02 04:33

    The main design decision for opting for multiple consumer group instances with the same id vs a single consumer group instance is resiliency. For example if you have a single consumer with two threads then if this machine goes down you loose all consumers. If you have two separate consumer groups with the same id, each on different hosts then they can survive failure. Ideally each consumer group should have two threads in the above, therefore if one host goes down the other consumer group uses its dormant thread to take up the other partition. Indeed it is always desirable to have more threads than partitions to cover this factor.

    1. You can run each consumer group on different hosts. With a single consumer group for a given name/id it will only ever run on a single host as it manages all its threads in a single runtime environment.
    2. Kafka has an algorithm to determine which threads/consumer groups reads the various topic partitions. Kafka tries to evenly distribute these in a resilient fashion. When a consumer group fails, it enables other threads in other groups to read the given partition.
    3. Refers to a single thread in the consumer group. If there are more threads than partitions then some of them will just remain dormant until other threads fail to offer resiliancy.
    4. The preference relates to resilience. So with multiple consumer groups setup with the same id I can run on multiple hosts making my application tolerant to failure.
    0 讨论(0)
提交回复
热议问题