kafka __consumer_offsets topic logs rapidly growing in size reducing disk space

情到浓时终转凉″ 提交于 2021-02-08 02:17:11

问题


I find that the __consumer_offsets topic log size is growing rapidly and after studying it further found the topics with the highest volume. I changed the retention policy on these topics to stop the rate of growth but would like to increase disk space and delete all the old logs for __consumer_offsets topic.

But this will cause all the other topics and consumers/producers to get corrupted or lose valuable metadata. Is there a way I can accomplish this? I'm looking at the parameters for the config which includes cleanup policy and compression but not sure how to specify this specifically for the topics that caused this rapid growth.

https://docs.confluent.io/current/installation/configuration/topic-configs.html

Appreciate any assistance here.


回答1:


In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes while the latter by log.retention.hours.

In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a delete cleanup policy, you'd need to configure the following parameters to

log.cleaner.enable=true
log.cleanup.policy=delete

Then you need to think about the configuration of log.retention.bytes, log.segment.bytes and log.retention.check.interval.ms. To do so, you have to take into consideration the following factors:

  • log.retention.bytes is a minimum guarantee for a single partition of a topic, meaning that if you set log.retention.bytes to 512MB, it means you will always have 512MB of data (per partition) in your disk.

  • Again, if you set log.retention.bytes to 512MB and log.retention.check.interval.ms to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.

  • A topic log on disk, is made up of segments. The segment size is dependent to log.segment.bytes parameter. For log.retention.bytes=1GB and log.segment.bytes=512MB, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).

Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure log.retention.hours accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48.


Now in order to change the retention policy just for the __consumer_offsets topic, you can simply run:

bin/kafka-configs.sh \
    --zookeeper localhost:2181 \
    --alter \
    --entity-type topics \
    --entity-name __consumer_offsets \
    --add-config retention.bytes=...

As a side note, you must be very careful with the retention policy for the __consumer_offsets as this might mess up all your consumers.




回答2:


The topic "__consumer_offsets" is an internal topic which is used to manage the offsets of each Consumer Group. Producers will not be directly impacted by any change/modification in this topic.

Saying that, and also emphasizing your expecrience, you should be very careful about changing the configuration of this topic.

I suggest to tweak the topic configurations for compacted topics. The cleanup policy should be kept at "compacted".

Reduce max.compaction.lag.ms (cluster-wide setting: log.cleaner.max.compaction.lag.ms) which defaults to MAX_LONG to something like 60000.

Reduce the ratio when a compaction is triggered through min.cleanable.dirty.ratio (cluster-wide setting: log.cleaner.min.cleanable.ratio) which defaults to 0.5 to something like 0.1.

That way, the compactions will be conducted more often without loosing any essential information.

Deleting old records in __consumer_offsets

The topic will pile up if you use many unique Consumer Groups (e.g. by using console-consumer which creates by default a random Consumer Group each time it is being executing).

To clean "old and un-needed" entries in the topic you need to be aware how to delete a message out of a compacted topic. This is done by producing a message to the topic with a null value. That way you will eventually delete the messages for the same key. You just have to figure out the keys of the messages you want to get rid of.



来源:https://stackoverflow.com/questions/61956217/kafka-consumer-offsets-topic-logs-rapidly-growing-in-size-reducing-disk-space

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!