Slow range query on a multikey index

余生颓废 提交于 2019-12-06 01:03:56

问题


I have a MongoDB collection named post with 35 million objects. The collection has two secondary indexes defined as follows.

> db.post.getIndexKeys()
[
    {
        "_id" : 1
    },
    {
        "namespace" : 1,
        "domain" : 1,
        "post_id" : 1
    },
    {
        "namespace" : 1,
        "post_time" : 1,
        "tags" : 1  // this is an array field
    }
]

I expect the following query, which simply filters by namespace and post_time, to run in a reasonable time without scanning all objects.

>db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).count()
7408

However, it takes MongoDB at least ten minutes to retrieve the result and, curiously, it manages to scan 70 million objects to do the job according to the explain function.

> db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).explain()
{
    "cursor" : "BtreeCursor namespace_1_post_time_1_tags_1",
    "isMultiKey" : true,
    "n" : 7408,
    "nscannedObjects" : 69999186,
    "nscanned" : 69999186,
    "nscannedObjectsAllPlans" : 69999186,
    "nscannedAllPlans" : 69999186,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 378967,
    "nChunkSkips" : 0,
    "millis" : 290048,
    "indexBounds" : {
        "namespace" : [
            [
                "my_namespace",
                "my_namespace"
            ]
        ],
        "post_time" : [
            [
                ISODate("2013-04-09T00:00:00Z"),
                ISODate("292278995-01--2147483647T07:12:56.808Z")
            ]
        ],
        "tags" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "localhost:27017"
}

The difference between the number of objects and the number of scans must be caused by the lengths of the tag arrays (which are all equal to 2). Still, I don't understand why post_time filter does not make use of the index.

Can you tell me what I might be missing?

(I am working on a descent machine with 24 cores and 96 GB RAM. I am using MongoDB 2.2.3.)


回答1:


Found my answer in this question: Order of $lt and $gt in MongoDB range query

My index is a multikey index (on tags) and I am running a range query (on post_time). Apparently, MongoDB cannot use both sides of the range as a filter in this case, so it just picks the $gte clause, which comes first. As my lower limit happens to be the lowest post_time value, MongoDB starts scanning all the objects.

Unfortunately, this is not the whole story. Trying to solve the problem, I created non-multikey indexes too but MongoDB insisted on using the bad one. That made me think that the problem was elsewhere. Finally, I had to drop the multikey index and create one without the tags field. Everything is fine now.



来源:https://stackoverflow.com/questions/16460471/slow-range-query-on-a-multikey-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!