How to reduce disk space occupied by a partition?

混江龙づ霸主 提交于 2019-12-11 16:54:05

问题


In my specific use-case, we are going to ingest 1000GB of data everyday. If I compress the files locally then it comes about 100GB.

I wrote a sample application to stream 100MB file (which converts to 10MB after compression). Single producer, single topic with single partition.

I have use transactions and enabled compression (gzip). I ran command to find out the total size of the partition and it came about 85MB. As Kafka, might be adding some data; in order to guarantee exactly-once semantics. I create huge batch of messages and commit them in transactions. Each message is compressed.

I also looked at what Kafka has stored internally:

  • 0000.index
  • 0000.log (this consumed the most amount of disk-space)
  • 0000.timeindex
  • 0000.snapshot
  • leader-epoch-checkpoint

I have 2 questions:

  1. Why Kafka topic uses so much disk space even after compression?

  2. What can I do to reduce the disk space of my partition? FYI, log compaction will not be effective in my case, as every message is going to have a unique key.

来源:https://stackoverflow.com/questions/54674867/how-to-reduce-disk-space-occupied-by-a-partition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!