MongoDB aggregate pipeline slow after first match step

后端 未结 1 1632
暗喜
暗喜 2020-12-20 02:33

I have a MongoDB aggregate pipeline that contains a number of steps (match on indexed fields, add fields, sort, collapse, sort again, page, project results.) If I comment ou

相关标签:
1条回答
  • 2020-12-20 02:50

    2019 ANSWER

    This answer is for MongoDB 4.2

    After reading the question and the discussion between you guys, I believe that the issue is resolved but still optimization is a common problem for all who are using MongoDB.

    I faced the same problem, and here are the tips for query optimization.

    Correct me if I'm wrong :)

    1. Add index on collection

    Indexes play a vital role in running queries quickly as Indexes are data structures that can store the collection’s data set in a form that is easy to traverse. Queries are efficiently executed with the help of indexes in MongoDB.

    You can create a different type of indexes according to your need. Learn more about indexes here, the official MongoDB documentation.

    2. Pipeline optimization

    • Always use $match before $project, as filters remove extra documents and fields from the next stage.
    • Always remember, indexes are used by $match and $sort. So, try to add an index to the fields on which you going to sort or filter documents.
    • Try to keep this sequence in your query, use $sort before $limit like $sort + $limit + $skip. Because $sort takes advantage of the index and allows MongoDB to select the required query plan while executing the query.
    • Always use $limit before $skip so that skip will be applied to limit Documents.
    • Use $project to return only the necessary data in the next stage.
    • Always create an index on the foreignField attributes in a $lookup. Also, as lookup produces an array, we generally unwind it in next stage. So, instead of unwinding it in next stage unwind it inside the lookup like:

      {
      $lookup: {
          from: "Collection",
          as: "resultingArrays",
          localField: "x",
          foreignField: "y",
          unwinding: { preserveNullAndEmptyArrays: false }
      

      } }

    • Use allowDiskUse in aggregation, with the help of it aggregation operations can write data to the _tmp subdirectory in the Database Path directory. It is used to perform the large query on temp directory. For example:

       db.orders.aggregate(
       [
              { $match: { status: "A" } },
              { $group: { _id: "$uid", total: { $sum: 1 } } },
              { $sort: { total: -1 } }
       ],
       {
              allowDiskUse: true
       },
       )
      

    3. Rebuild the indexes

    If you are creating and deleting indexes quite often then rebuild your indexes. It helps MongoDB to refresh, the previously-stored query plan in, the cache, which keeps on taking over the required query plan, believe me, that issue sucks :(

    4. Remove unwanted indexes

    Too many indexes take too much time in Create, Update and Delete operation as they need to create index along with their tasks. So, remove them helps a lot.

    5. Limiting Documents

    In a real-world scenario, fetching complete data present in the database does not help. Also, either you can't display it or the user can't read complete fetched data. So, instead of fetching complete data, fetch data in chunks which helps both you and your client watching that data.

    And lastly watching what execution plan is selected by MongoDB helps in figuring out the main issue. So, $explain will help you in figuring that out.

    Hope this summary will help you guys, feel free to suggest new points if I missed any. I will add them too.

    0 讨论(0)
提交回复
热议问题