Kafka Stream reprocessing old messages on rebalancing

懵懂的女人 提交于 2020-01-02 23:14:16

问题


I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic.

This is the configuration of my Kafka cluster:

5 Kafka brokers
Kafka topics - 15 partitions and replication factor 3. 

My Kafka Streams applications are running on the same machines as my Kafka broker.

A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts consuming very old messages.

Note: When the Kafka Streams application was running fine, its consumer lag was almost 0. But after rebalancing, its lag went from 0 to 10million.

Can this be because of offset.retention.minutes.

This is the log and offset retention policy configuration of my Kafka broker:

log retention policy : 3 days
offset.retention.minutes : 1 day

In the below link I read that this could be the cause:

Offset Retention Minutes reference

Any help in this would be appreciated.


回答1:


Offset retention can have an impact. Cf this FAQ: https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning

Also cf How to commit manually with Kafka Stream? and How to commit manually with Kafka Stream? about how commits work.



来源:https://stackoverflow.com/questions/46189625/kafka-stream-reprocessing-old-messages-on-rebalancing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!