MongoDB: does document size affect query performance?

后端 未结 4 1831
不思量自难忘°
不思量自难忘° 2021-01-31 16:31

Assume a mobile game that is backed by a MongoDB database containing a User collection with several million documents.

Now assume several dozen properties t

4条回答
  •  既然无缘
    2021-01-31 17:12

    First of all you should spend a little time reading up on how MongoDB stores documents with reference to padding factors and powerof2sizes allocation:

    http://docs.mongodb.org/manual/core/storage/ http://docs.mongodb.org/manual/reference/command/collStats/#collStats.paddingFactor

    Put simply MongoDB tries to allocate some additional space when storing your original document to allow for growth. Powerof2sizes allocation became the default approach in version 2.6, where it will grow the document size in powers of 2.

    Overall, performance will be much better if all updates fit within the original size allocation. The reason is that if they don't, the entire document needs to be moved someplace else with enough space, causing more reads and writes and in effect fragmenting your storage.

    If your documents are really going to grow in size by a factor of 10X to 20X overtime that could mean multiple moves per document, which depending on your insert, update and read frequency could cause issues. If that is the case there are a couple of approaches you can consider:

    1) Allocate enough space on initial insertion to cover most (let's say 90%) of normal documents lifetime growth. While this will be inefficient in space usage at the beginning, efficiency will increase with time as the documents grow without any performance reduction. In effect you will pay ahead of time for storage that you will eventually use later to get good performance over time.

    2) Create "overflow" documents - let's say a typical 80-20 rule applies and 80% of your documents will fit in a certain size. Allocate for that amount and add an overflow collection that your document can point to if they have more than 100 friends or 100 Game documents for example. The overflow field points to a document in this new collection and your app only looks in the new collection if the overflow field exists. Allows for normal document processing for 80% of the users, and avoids wasting a lot of storage on the 80% of user documents that won't need it, at the expense of additional application complexity.

    In either case I'd consider using covered queries by building the appropriate indexes:

    A covered query is a query in which:

    all the fields in the query are part of an index, and
    all the fields returned in the results are in the same index.
    

    Because the index “covers” the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query.

    Querying only the index can be much faster than querying documents outside of the index. Index keys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk.

    More on that approach here: http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/

提交回复
热议问题