Kafka not deleting key with tombstone

前端 未结 3 1614
一生所求
一生所求 2020-12-30 08:28

I create a kafka topic with below properties

min.cleanable.dirty.ratio=0.01,delete.retention.ms=100,segment.ms=100,cleanup.policy=compact

Le

3条回答
  •  孤城傲影
    2020-12-30 08:50

    Compacted topic has two portions:

    1) Cleaned portion: Portion of kafka log cleaned by kafka cleaner at least once.

    2) Dirty portion: Portion of kafka log not cleaned by kafka cleaner even once until now. Kafka maintains dirty offset. All messages with offset >= dirty offset belong to dirty portion.

    Note: Kafka cleaner cleans all segments (irrespective of whether segment is in cleaned/dirty portion) and re-copies them every time dirty ratio reaches min.cleanable.dirty.ratio.

    Tombstones are deleted segment wise. Tombstones in a segment are deleted if segment satisfies below conditions:

    1. Segment should be in cleaned portion of log.

    2. Last modified time of segment should be <= (Last modified time of segment containing a message with offset=(dirty offset - 1)) - delete.retention.ms.

    It is difficult to elaborate second point but in simple terms, Second point implies => Segment size should be equal to log.segment.bytes/segment.bytes (1GB by default). For segment size (in cleaner portion) to be equal to 1GB, you need to produce large number of messages with distinctive keys. But you produced only 4 messages with 3 messages having same key. That is why tombstones are not deleted in segment containing 1111:null message (Segment doesn't satisfy second point I mentioned above).

    You have two options to delete tombstones with 4 messages:

    1. make delete.retention.ms=0 or
    2. make log.segment.bytes/segment.bytes=50.

    Source Code (Extra Reading): https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogCleaner.scala

    try {
          // clean segments into the new destination segment
          for (old <- segments) {
            val retainDeletes = old.lastModified > deleteHorizonMs
            info("Cleaning segment %s in log %s (largest timestamp %s) into %s, %s deletes."
                .format(old.baseOffset, log.name, new Date(old.largestTimestamp), cleaned.baseOffset, if(retainDeletes) "retaining" else "discarding"))
            cleanInto(log.topicPartition, old, cleaned, map, retainDeletes, log.config.maxMessageSize, stats)
          }
    

提交回复
热议问题