KafkaStream EXACTLY_ONCE in stream application results in failed to re-balance if one broker is down

喜欢而已 提交于 2020-03-01 07:50:11

问题


I have a Kafka streaming application with kafka-streams and kafka-clients both 2.4.0 with the following configs

properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
properties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
  • brokers= ip1:port1, ip2:port2,ip3:port3,
  • topic partition: 3
  • topic replication : 3

Scenario 1: I start only 2 brokers (stream app still contains three ips of broker in broker ip setting) and when i start the my stream app the following error occurs.

2020-02-13 13:28:19.711  WARN 18756 --- [-1-0_0-producer] org.apache.kafka.clients.NetworkClient   : [Producer clientId=my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1-0_0-producer, transactionalId=my-app1-0_0] Connection to node -2 (/ip2:port2) could not be established. Broker may not be available.

and later after 1 minute

org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1] Failed to rebalance.
    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1] task [0_0] Failed to initialize task 0_0 due to timeout.
    at org.apache.kafka.streams.processor.internals.StreamTask.initializeTransactions(StreamTask.java:966)
    at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:254)
    at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:176)
    at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:355)
    at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:313)
    at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:298)
    at org.apache.kafka.streams.processor.internals.TaskManager.addNewActiveTasks(TaskManager.java:160)
    at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:120)
    at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:77)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:272)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:400)
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:421)
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471)
    at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843)
    ... 3 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

I was Testing for High availability test scenarios. I think kafka should still work as replications are present in the two brokers properly(I have checked using kafka GUI tool).

Scenario 2: Today i noticed that when i start only 2 brokers and give the ips of theses two brokers (i.e. stream app only has the ip of two working brokers)

2020-02-16 16:18:24.818  INFO 5741 --- [-StreamThread-1] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1-consumer, groupId=my-app] Group coordinator ip2:port2 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
2020-02-16 16:18:24.818 ERROR 5741 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:

org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] Failed to rebalance.
    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] task [0_0] Failed to initialize task 0_0 due to timeout.
    at org.apache.kafka.streams.processor.internals.StreamTask.initializeTransactions(StreamTask.java:966)
    at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:254)
    at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:176)
    at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:355)
    at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:313)
    at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:298)
    at org.apache.kafka.streams.processor.internals.TaskManager.addNewActiveTasks(TaskManager.java:160)
    at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:120)
    at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:77)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:272)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:400)
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:421)
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471)
    at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
    at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843)
    ... 3 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

Note: This is not the case if i don['t set EXACTLY_ONCE in properties. Them it works as intended. Tried increasing reties and back off max ms but didn't help. Can anyone explain what i am missing?

logs of broker2 when broker 1 is down:

[2020-02-17 02:29:00,302] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Retrying leaderEpoch request for partition __consumer_offsets-36 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

Kafak logs are filled with the above line.

Now One Major Observation: When I turn off broker2(ie. broker 1 and broker 3 are running) then my stream application runs fine. My App shuts down only when broker 1 is down. I'm guessing some critical information that should be distributed between all brokers is only saved in broker 1.

来源:https://stackoverflow.com/questions/60203105/kafkastream-exactly-once-in-stream-application-results-in-failed-to-re-balance-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!