Kafka compaction for de-duplication

后端 未结 2 1480
一向
一向 2021-01-15 05:13

I\'m trying to understand how Kafka compaction works and have the following question: Does kafka guarantees uniqueness of keys for messages stored in topic with enabled comp

2条回答
  •  借酒劲吻你
    2021-01-15 05:21

    Short answer is no.

    Kafka doesn't guarantees uniqueness for key stored with enabled topic retention.

    In Kafka you have two types of cleanup.policy:

    • delete - It means that after configured time messages won't be available. There are several properties, that can be used for that: log.retention.hours, log.retention.minutes, log.retention.ms. By default log.retention.hours is set 168. It means, that messages older than 7 days will be deleted
    • compact - For each key at least one message will be available. In some situation it can be one, but in the most cases it will be more. Compaction processed is run in background periodically. It copies log parts with removing duplicates and only leaving last value.

    If you want to read only one value for each key, you have to use KTable abstraction from Kafka Streams.

    Related question regarding latest value for key and compaction: Kafka only subscribe to latest message?

提交回复
热议问题