Kafka-streams: Why do all partitions get assigned to the same consumer in the consumergroup?

蓝咒 提交于 2021-01-25 03:40:36

问题


Background

Several machines generate events. These events get sent to our Kafka cluster, where each machine has its own topic (app.machine-events.machine-name). Because order is important on a per-machine basis, and partition-size is not an issue for now, all topics consist of a single partition. Therefore N topics also means N partitions, currently.

The consuming/processing app makes use of kafka-streams, which we've given the StreamsConfig.APPLICATION_ID_CONFIG/"application.id" 'machine-event-processor', which remains the same for each instance, meaning they get put into the same consumer group for Kafka. This consumer is subscribed to the pattern app.machine-events.*, as for the processor it does not matter which machine's events it processes. This is verified by ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group machine-event-processor --members --verbose showing me a list matching the number of & IPs of all processing services running.

Expected

Given 20 machines and 5 instances of the processor, we'd expect each processor to handle ~4 partitions (and therefore ~4 topics).

Actually

There's one processor handling 20 partitions (and therefore 20 topics), with 4 other processors doing nothing at all/idling. Killing the 'lucky' processor, all 20 partitions get rebalanced to another processor, resulting in the new processor handling 20 partitions/topics, and 3 processors idling.

What I've tried so far

  • Check out partition.grouper. I don't feel like I understand it completely, but as far as I'm able to find, there's only the DefaultPartitioner option anyway, and writing a custom one should not be necessary as (as per the documentation) this setup should work. It does mention that partitions get joined into a task based on their partition key (all 0 for us, as there's only one partition per topic), but I was not able to completely understand this part.
  • Used RoundRobinAssignor for the consumer: settings.put(StreamsConfig.consumerPrefix(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG), new RoundRobinAssignor().getClass.getName) (Tried several values, as nothing seems to change.)
  • Check out other configuration properties, to see if I've missed something: None, I think.

The code, simplified

val streamConfig = new Properties
// {producer.metadata.max.age.ms=5000, consumer.metadata.max.age.ms=5000, default.key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde, consumer.partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor, bootstrap.servers=kafka:9092, application.id=machine-event-processor, default.value.serde=org.apache.kafka.common.serialization.Serdes$ByteArraySerde}
val builder: StreamsBuilder = new StreamsBuilder
val topicStream: KStream[String, Array[Byte]] = builder.stream(Pattern.compile("app.machine-events.*"))
topicStream.process(new MessageProcessorSupplier(context)) // The event is delegated to a processor, doing the actual processing logic
val eventStreams = new KafkaStreams(builder.build(), streamConfig)
eventStreams.start()

Notes

  • Kafka-streams 2.0.0 is being used:

    <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>2.0.0</version> </dependency>

  • Kafka is being run inside a container, using the wurstmeister/kafka:2.11-2.0.0 version. The docker-stack.yml service:

kafka: image: wurstmeister/kafka:2.11-2.0.0 ports: - target: 9094 published: 9094 protocol: tcp mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock healthcheck: test: ["CMD-SHELL", "$$(netstat -ltn | grep -q 9092)"] interval: 15s timeout: 10s retries: 5 environment: HOSTNAME_COMMAND: "docker info | grep ^Name: | cut -d' ' -f 2" KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 36000 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094 KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094 KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE KAFKA_DEFAULT_REPLICATION_FACTOR: 2 deploy: replicas: 2 restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s

  • Kafka is setup in a dual-node setup, forming a cluster. Through the docker environment variable we've set the replication factor to 2, so each partition should have a replication on each node.

Relevant topics/questions/discussions I've found and checked

  • KIP-49

  • https://faust.readthedocs.io/en/latest/developerguide/partition_assignor.html

  • Checked out the Kafka mail archives but did not find anything there

  • Checked out stream example apps

  • All-round searching for others that ran into this issue, but did not give me the answers I need. Also found KAFKA-7144 but this should not be an issue for us as we're running 2.0.0

If anyone has run into similar issues, or is able to point out my probably very stupid mistake, please enlighten me!


回答1:


For future readers running into this same issue, the solution was to not use N topics each having 1 partition, but using 1 topic with N partitions. Even with, say, 120 partitions and 400+ machines/event-sources, multiple event types will be put into the same partition, but this does not affect order of the events.

The implementation was to set the record key to the machine-name, and letting the underlying logic take care of which value goes to which partition. Since we now have a consumer-group with X consumers subscribed to this topic, the partitions are being divided over the consumers evenly, each taking 120/X partitions.

This was as Matthias suggested, which was further confirmed by other helpful people from Confluent at Devoxx Belgium 2018. Thank you!

Tip

When using the wurstmeister/kafka docker image, consider using the environment property:

KAFKA_CREATE_TOPICS: "app.machine-events:120:2"

meaning

topic-name:number-of-partitions:replication-factor



来源:https://stackoverflow.com/questions/52782555/kafka-streams-why-do-all-partitions-get-assigned-to-the-same-consumer-in-the-co

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!