Terribly degraded performance with other join conditions in $lookup (using pipeline)

后端 未结 1 685
庸人自扰
庸人自扰 2021-01-07 08:28

So during some code review I decided to improve existing query performance by improving one aggregation that was like this:

         


        
相关标签:
1条回答
  • 2021-01-07 09:02

    The second version adds an aggregation pipeline execution for each document in the joined collection.

    The documentation says:

    Specifies the pipeline to run on the joined collection. The pipeline determines the resulting documents from the joined collection. To return all documents, specify an empty pipeline [].

    The pipeline is executed for each document in the collection, not for each matched document.

    Depending on how large the collection is (both # of documents and document size) this could come out to a decent amount of time.

    after removing the limit, the pipeline version jumped to over 10 seconds

    Makes sense - all of the additional documents due to the removal of limit also must have the aggregation pipeline executed for them.

    It is possible that per-document execution of aggregation pipeline isn't as optimized as it could be. For example, if the pipeline is set up and torn down for each document, there could easily be more overhead in that than in the $match conditions.

    Is there any case when using one or the other?

    Executing an aggregation pipeline per joined document provides additional flexibility. If you need this flexibility, it may make sense to execute the pipeline, though performance needs to be considered regardless. If you don't, it is sensible to use a more performant approach.

    0 讨论(0)
提交回复
热议问题