Kafka having duplicate messages

旧城冷巷雨未停 提交于 2020-01-23 09:51:45

问题


I don't see any failure while producing or consuming the data however there are bunch of duplicate messages in production. For a small topic which gets around 100k messages, there are ~4k duplicates though like I said there is no failure and on top of that there is no retry logic implemented or config value is set.

I also check offset values for those duplicate messages and each has distinct values which tells me that the issue is in producer.

Any help would be highly appreciated


回答1:


Read more about message delivery in kafka:

https://kafka.apache.org/08/design.html#semantics

So effectively Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward.

Probably you are looking for "exactly once delivery" like in jms

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka?

There are two approaches to getting exactly once semantics during data production: 1. Use a single-writer per partition and every time you get a network error check the last message in that partition to see if your last write succeeded 2. Include a primary key (UUID or something) in the message and deduplicate on the consumer.

We implemented second point in our systems.



来源:https://stackoverflow.com/questions/34035870/kafka-having-duplicate-messages

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!