Total MongoDB storage size

前端 未结 1 364
独厮守ぢ
独厮守ぢ 2021-01-13 00:30

I have a sharded and replicated MongoDB with dozens millions of records. I know that Mongo writes data with some padding factor, to allow fast updates, and I also know that

1条回答
  •  有刺的猬
    2021-01-13 01:18

    The short answer is: you can't, not based solely on avg. document size (at least not in any accurate way).

    To explain more verbosely:

    The space needed on disk is not simply a function of the average document size. There is also the space needed for any indexes you create. Then there is the space needed if you do trigger those moves (despite padding, this does happen) - that space is placed on a list to be re-used but depending on the data you subsequently insert, it may or may not be possible to re-use that space.

    You can also add into the fact that pre-allocation will mean that occasionally a handful of documents will increase your on-disk space utilization by ~2GB as a new data file is allocated. Of course, with sufficient data, this will be essentially a rounding error but it is worth bearing in mind.

    The only way to estimate this type of data to size ratio, assuming a consistent usage pattern, is to trend it over time for your particular use case and track the disk space usage versus the data inserted (number of documents might be better than data volume depending on variability of doc size).

    Similarly, if you track the insertion rate, doc size and the space gained back from a resync/repair. FYI - you can resync a secondary from scratch to get a "fresh" copy of the data files rather than running a repair, which can be less disruptive, and use less space depending on your set up.

    0 讨论(0)
提交回复
热议问题