If i set 'compression.type' at topic level and producer level, which takes precedence

前端 未结 2 1356
青春惊慌失措
青春惊慌失措 2020-12-20 18:52

I\'m trying to understand the \'compression.type\' configuration and my question is, If i set \'compression.type\' at topic level and producer level, which takes precedence?

相关标签:
2条回答
  • 2020-12-20 19:19

    I tried out some experiments to answer this:

    **Note:** server.properties has the config compression.type=producer 
    

    ./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=producer --topic t

    ./kafka-console-producer.sh --broker-list node:6667  --topic t
    ./kafka-console-producer.sh --broker-list node:6667  --topic t --compression-codec gzip
    ./kafka-console-producer.sh --broker-list node:6667  --topic t
    
    sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t-0/00000000000000000000.log
    
    Dumping /kafka-logs/t-0/00000000000000000000.log
    Starting offset: 0
    offset: 0 position: 0 compresscodec: NONE 
    offset: 1 position: 69 compresscodec: GZIP 
    offset: 2 position: 158 compresscodec: NONE 
    

    ./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=gzip --topic t1

    ./kafka-console-producer.sh --broker-list node:6667  --topic t1
    ./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec gzip
    ./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec snappy
    
     sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t1-0/00000000000000000000.log
    Dumping /kafka-logs/t1-0/00000000000000000000.log
    Starting offset: 0
    offset: 0 position: 0 compresscodec: GZIP 
    offset: 1 position: 89 compresscodec: GZIP 
    offset: 2 position: 178 compresscodec: GZIP 
    

    Clearly the topic takes the override.

    w.r.t compression and decompression text from Kafka - the definitive guide

    The Kafka broker must decompress all message batches, however, in order to validate the checksum of the individual messages and assign offsets. It then needs to recompress the message batch in order to store it on disk.

    As of version 0.10, there is a new message format that allows for relative offsets in a message batch. This means that newer producers will set relative offsets prior to sending the message batch, which allows the broker to skip recompression of the message batch.

    So, when the compression type is different, the topic compression is honoured. if it is same, it will retain the original compression codec set by the producer.

    Reference - https://kafka.apache.org/documentation/

    0 讨论(0)
  • 2020-12-20 19:39

    When the broker receives a compressed batch of messages from a producer:

    • it always decompresses the data in order to validate it
    • it considers the compression codec of the destination topic
      • if the compression codec of the destination topic is producer, or if the codecs of the batch and destination topic are the same, the broker takes the compressed batch from the client and writes it directly to the topic’s log file without recompressing the data.
      • Otherwise, the broker needs to re-compress the data to match the codec of the destination topic.

    Decompression and re-compression can also happen if producers are running a version prior to 0.10 because offsets need to be overwritten, or if any other message format conversion is required.

    0 讨论(0)
提交回复
热议问题