问题
I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic.
This is the configuration of my Kafka cluster:
5 Kafka brokers
Kafka topics - 15 partitions and replication factor 3.
My Kafka Streams applications are running on the same machines as my Kafka broker.
A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts consuming very old messages.
Note: When the Kafka Streams application was running fine, its consumer lag was almost 0. But after rebalancing, its lag went from 0 to 10million.
Can this be because of offset.retention.minutes.
This is the log and offset retention policy configuration of my Kafka broker:
log retention policy : 3 days
offset.retention.minutes : 1 day
In the below link I read that this could be the cause:
Offset Retention Minutes reference
Any help in this would be appreciated.
回答1:
Offset retention can have an impact. Cf this FAQ: https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning
Also cf How to commit manually with Kafka Stream? and How to commit manually with Kafka Stream? about how commits work.
来源:https://stackoverflow.com/questions/46189625/kafka-stream-reprocessing-old-messages-on-rebalancing