问题
I find that the __consumer_offsets
topic log size is growing rapidly and after studying it further found the topics with the highest volume. I changed the retention policy on these topics to stop the rate of growth but would like to increase disk space and delete all the old logs for __consumer_offsets
topic.
But this will cause all the other topics and consumers/producers to get corrupted or lose valuable metadata. Is there a way I can accomplish this? I'm looking at the parameters for the config which includes cleanup policy and compression but not sure how to specify this specifically for the topics that caused this rapid growth.
https://docs.confluent.io/current/installation/configuration/topic-configs.html
Appreciate any assistance here.
回答1:
In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes
while the latter by log.retention.hours
.
In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a delete
cleanup policy, you'd need to configure the following parameters to
log.cleaner.enable=true
log.cleanup.policy=delete
Then you need to think about the configuration of log.retention.bytes
, log.segment.bytes
and log.retention.check.interval.ms
. To do so, you have to take into consideration the following factors:
log.retention.bytes
is a minimum guarantee for a single partition of a topic, meaning that if you setlog.retention.bytes
to 512MB, it means you will always have 512MB of data (per partition) in your disk.Again, if you set
log.retention.bytes
to 512MB andlog.retention.check.interval.ms
to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.A topic log on disk, is made up of segments. The segment size is dependent to
log.segment.bytes
parameter. Forlog.retention.bytes=1GB
andlog.segment.bytes=512MB
, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).
Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure log.retention.hours
accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48
.
Now in order to change the retention policy just for the __consumer_offsets
topic, you can simply run:
bin/kafka-configs.sh \
--zookeeper localhost:2181 \
--alter \
--entity-type topics \
--entity-name __consumer_offsets \
--add-config retention.bytes=...
As a side note, you must be very careful with the retention policy for the __consumer_offsets
as this might mess up all your consumers.
回答2:
The topic "__consumer_offsets" is an internal topic which is used to manage the offsets of each Consumer Group. Producers will not be directly impacted by any change/modification in this topic.
Saying that, and also emphasizing your expecrience, you should be very careful about changing the configuration of this topic.
I suggest to tweak the topic configurations for compacted topics. The cleanup policy should be kept at "compacted".
Reduce max.compaction.lag.ms
(cluster-wide setting: log.cleaner.max.compaction.lag.ms
) which defaults to MAX_LONG to something like 60000.
Reduce the ratio when a compaction is triggered through min.cleanable.dirty.ratio
(cluster-wide setting: log.cleaner.min.cleanable.ratio
) which defaults to 0.5 to something like 0.1.
That way, the compactions will be conducted more often without loosing any essential information.
Deleting old records in __consumer_offsets
The topic will pile up if you use many unique Consumer Groups (e.g. by using console-consumer which creates by default a random Consumer Group each time it is being executing).
To clean "old and un-needed" entries in the topic you need to be aware how to delete a message out of a compacted topic. This is done by producing a message to the topic with a null
value. That way you will eventually delete the messages for the same key. You just have to figure out the keys of the messages you want to get rid of.
来源:https://stackoverflow.com/questions/61956217/kafka-consumer-offsets-topic-logs-rapidly-growing-in-size-reducing-disk-space