问题
I don't see any failure while producing or consuming the data however there are bunch of duplicate messages in production. For a small topic which gets around 100k messages, there are ~4k duplicates though like I said there is no failure and on top of that there is no retry logic implemented or config value is set.
I also check offset values for those duplicate messages and each has distinct values which tells me that the issue is in producer.
Any help would be highly appreciated
回答1:
Read more about message delivery in kafka:
https://kafka.apache.org/08/design.html#semantics
So effectively Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward.
Probably you are looking for "exactly once delivery" like in jms
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka?
There are two approaches to getting exactly once semantics during data production: 1. Use a single-writer per partition and every time you get a network error check the last message in that partition to see if your last write succeeded 2. Include a primary key (UUID or something) in the message and deduplicate on the consumer.
We implemented second point in our systems.
来源:https://stackoverflow.com/questions/34035870/kafka-having-duplicate-messages