How can I decrease unwind stages in aggregation pipeline for nested documents?

后端 未结 2 1427
滥情空心
滥情空心 2021-01-28 22:31

I am new in mongodb and trying to work with nested documents.I have a query as below

    db.EndpointData.aggregate([
{ \"$group\" : { \"_id\" : \"$EndpointId\",         


        
2条回答
  •  粉色の甜心
    2021-01-28 22:49

    If you're dealing with data on the order of 10,000,000 documents, you're going to run into aggregation pipeline size limits easily. Specifically, according to the MongoDB documentation, there is a pipeline RAM use limit of 100MB. If each document has at least 10 bytes of data, then that's enough to hit that limit, and your documents would absolutely exceed that amount.

    There are a few options available to you to resolve this problem:

    1) You can use the allowDiskUse option as noted in the documentation.

    2) You can project your documents further between unwind stages to limit document size (very unlikely to be enough on its own).

    3) You can periodically generate summary documents on subsets of your data, and then perform your aggregations on those summary documents. If, for example, you run summary documents on subsets of size 1,000, you can reduce the number of documents in your pipelines from 10,000,000 to just 10,000.

    4) You can look into sharding your collection and running these aggregate operations on a cluster to reduce the load on any single server.

    Options 1 and 2 are both very short-term solutions. They're easy to implement, but won't help much in the long run. Options 3 and 4, however, are far more involved and trickier to implement, but will provide the greatest amount of scalability and are more likely to continue meeting your needs long-term.

    Do be warned, however, that if you plan to approach option 4, you need to be very prepared. A sharded collection cannot be unsharded, and messing up can cause potentially irreparable data loss. Having a dedicated DBA with experience with MongoDB clusters is recommended.

提交回复
热议问题