Poor lookup aggregation performance

前端 未结 4 1235
臣服心动
臣服心动 2020-12-03 00:05

I have two collections

Posts:

{
    \"_Id\": \"1\",
    \"_PostTypeId\": \"1\",
    \"_AcceptedAnswerId\": \"192\",
    \"_CreationDate\": \"2012-02-         


        
相关标签:
4条回答
  • 2020-12-03 00:15

    from https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

    foreignField Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.

    This will be performed the same as any other query.

    If you don't have an index on the field _AccountId, it will do a full tablescan query for each one of the 10,000 posts. The bulk of the time will be spent in that tablescan.

    db.users.ensureIndex("_AccountId", 1) 
    

    speeds up the process so it's doing 10,000 index hits instead of 10,000 table scans.

    0 讨论(0)
  • 2020-12-03 00:16

    as long as you're going to group by user _AccountId, you should do the $group first by _OwnerUserId then lookup only after filtering accounts having 10<postsCount<15 this will reduce lookups:

    db.posts.aggregate([{
        $group: {
          _id: "$_OwnerUserId",
          postsCount: {
            $sum: 1
          },
          posts: {
            $push: "$$ROOT"
          } //if you need to keep original posts data
        }
      },
      {
        $match: {
          postsCount: {
            $gte: 5,
            $lte: 15
          }
        }
      },
      {
        $lookup: {
          from: "users",
          localField: "_id",
          foreignField: "_AccountId",
          as: "X"
        }
      },
      {
        $unwind: "$X"
      },
      {
        $sort: {
          postsCount: -1
        }
      },
      {
        $project: {
          postsCount: 1,
          X: 1
        }
      }
    ])

    0 讨论(0)
  • 2020-12-03 00:21

    In addition to bauman.space's suggestion to put an index on the _accountId field (which is critical), you should also do your $match stage as early as possible in the aggregation pipeline (i.e. as the first stage). Even though it won't use any indexes (unless you index the posts field), it will filter the result set before doing the $lookup (join) stage.

    The reason why your query is terribly slow is that for every post, it is doing a non-indexed lookup (sequential read) for every user. That's around 60m reads!

    Check out the Pipeline Optimization section of the MongoDB Aggregation Docs.

    0 讨论(0)
  • 2020-12-03 00:34

    First use $match then $lookup. $match filter the rows need to be examined to $lookup. It's efficient.

    0 讨论(0)
提交回复
热议问题