Using an Index with Mongo's $first Group Operator

十年热恋 提交于 2020-05-16 22:00:20

问题


Per Mongo's latest $group documentation, there is a special optimization for $first:

Optimization to Return the First Document of Each Group

If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.

It makes sense, since only the first entry in an ordered index should be needed for each bin in the $group stage. Unfortunately, in my testing, I've gotten a query that renders ~800k sorted records in about 1s, then passes them to $group, where it takes about 10s to render the 1.7k output docs for some values of key (see example below). For other values of key, it times out at 300s. There should be exactly 1704 bins in the group regardless of key, and those query bins should be covered by the first three entries in the index, as near as I can tell. Am I missing something?

db.getCollection('time_series').aggregate([
    {
        '$match': {
            'organization_id': 1,
            'key': 'waffle_count'
        }
    },
    {
        '$sort': {
            'key': 1, 'asset_id': 1, 'date_time': - 1
        }
    },
    {
        '$group': {
            '_id': {
                'key': '$key', 'asset_id': '$asset_id'
            },
            'value': {
                '$first': '$value'
            }
        }
    }
]);

Here is the index:

{
    "organization_id": 1,
    "key": 1,
    "asset_id": 1,
    "date_time": -1
}

回答1:


I sent a request to Atlas's MongoDB Support. The optimization that I quoted isn't available until version 4.2 (we are using 3.6). Quoting Atlas Support:

The enhancement that you're mentioning was implemented in 4.2 via SERVER-9507. For your particular example, it seems you may also need SERVER-40090 to be implemented in order for your pipeline to fully take advantage of the improvement. We will let the team know of its potential benefit for your specific situation.

As of now, the second issue is not fixed and requires a simple $group _id setup like:

'_id': 'asset_id': '$asset_id'

Whereas a key specified as an object will fail to use the index, even if it is not a composite key, like so:

'_id': { 'asset_id': '$asset_id' }


来源:https://stackoverflow.com/questions/61369835/using-an-index-with-mongos-first-group-operator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!