MongoDB Compound Indexes - Does the sort order matter?

江枫思渺然 提交于 2019-12-12 09:27:32

问题


I've dived recently into mongodb for a project of mine. I've been reading up on indexes, and for a small collection, i know it wouldn't matter much but when it grows there's going to be performance issues without the right indexes and queries.

Lets say i have a collection like so

{user_id:1,slug:'one-slug'}
{user_id:1,slug:'another-slug'}
{user_id:2,slug:'one-slug'}
{user_id:3,slug:'just-a-slug}

And i have to search my collection where

user id == 1 and slug == 'one-slug'

In this collection, slugs will be unique to user ids. That is, user id 1 can have only one slug of the value 'one-slug'.

I understand that user_id should be given priority due to its high cardinality, but what about slug? Since its unique as well most of the time. I also cant wrap my head around ascending and descending indexes, or how its going to affect performance in this case or the right order i should be using in this collection.

I've read a bit but i can't wrap my head around it, particularly for my scenario. Would be awesome to hear from others.


回答1:


You can think of MongoDB single-field index as an array, with pointers to document locations. For example, if you have a collection with (note that the sequence is deliberately out-of-order):

[collection]
1: {a:3, b:2}
2: {a:1, b:2}
3: {a:2, b:1}
4: {a:1, b:1}
5: {a:2, b:2}

Single-field index

Now if you do:

db.collection.createIndex({a:1})

The index approximately looks like:

[index a:1]
1: {a:1} --> 2, 4
2: {a:2} --> 3, 5
3: {a:3} --> 1

Note three important things:

  • It's sorted by a ascending
  • Each entry points to the location where the relevant documents resides
  • The index only records the values of the a field. The b field does not exist in the index at all

So if you do a query like:

db.collection.find().sort({a:1})

All it has to do is to walk the index from top to bottom, fetching and outputting the document pointed to by the entries. Notice that you can also walk the index from the bottom, e.g.:

db.collection.find().sort({a:-1})

and the only difference is you walk the index in reverse.

Because b is not in the index at all, you cannot use the index when querying anything about b.

Compound index

In a compound index e.g.:

db.collection.createIndex({a:1, b:1})

It means that you want to sort by a first, then sort by b. The index would look like:

[index a:1, b:1]
1: {a:1, b:1} --> 4
2: {a:1, b:2} --> 2
3: {a:2, b:1} --> 3
4: {a:2, b:2} --> 5
5: {a:3, b:2} --> 1

Note that:

  • The index is sorted from a
  • Within each a you have a sorted b
  • You have 5 index entries vs. only three in the previous single-field example

Using this index, you can do a query like:

db.collection.find({a:2}).sort({b:1})

It can easily find where a:2 then walk the index forward. Given that index, you cannot do:

db.collection.find().sort({b:1})
db.collection.find({b:1})

In both queries you can't easily find b since it's spread all over the index (i.e. not in contiguous entries). However you can do:

db.collection.find({a:2}).sort({b:-1})

since you can essentially find where the a:2 are, and walk the b entries backward.

Edit: clarification of @marcospgp's question in the comment:

The possibility of using the index {a:1, b:1} to satisfy find({a:2}).sort({b:-1}) actually make sense if you see it from a sorted table point of view. For example, the index {a:1, b:1} can be thought of as:

a | b
--|--
1 | 1
1 | 2
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2

find({a:2}).sort({b:1})

The index {a:1, b:1} means sort by a, then within each a, sort the b values. If you then do a find({a:2}).sort({b:1}), the index knows where all the a=2 are. Within this block of a=2, the b would be sorted in ascending order (according to the index spec), so that query find({a:2}).sort({b:1}) can be satisfied by:

a | b
--|--
1 | 1
1 | 2
2 | 1 <-- walk this block forward to satisfy
2 | 2 <-- find({a:2}).sort({b:1})
2 | 3 <--
3 | 1
3 | 2

find({a:2}).sort({b:-1})

Since the index can be walked forward or backwards, a similar procedure was followed, with a small twist at the end:

a | b
--|--
1 | 1
1 | 2
2 | 1  <-- walk this block backward to satisfy
2 | 2  <-- find({a:2}).sort({b:-1})
2 | 3  <--
3 | 1
3 | 2

The fact that the index can be walked forward or backward is the key point that enables the query find({a:2}).sort({b:-1}) to be able to use the index {a:1, b:1}.

Query planner explain

You can see what the query planner plans by using db.collection.explain().find(....). Basically if you see a stage of COLLSCAN, no index was used or can be used for the query. See explain results for details on the command's output.




回答2:


[Cannot comment due to a lack of reputation]

Index direction only matters when you're sorting.

Not completely exact : some queries can be faster with particular direction index, even if no order is required in the query itself (sorting is just for results). For example, queries with date criteria : searching for users who subscribe yesterday will be faster with a desc direction on index, than with asc direction or no index.

difference between {user_id:1,slug:1} and {slug:1,user_id:1}

mongo will filter on first field, then on second field with first field matching (and so on...) in index. The more restrictive fields must be at first places to really improve the query



来源:https://stackoverflow.com/questions/51171783/mongodb-compound-indexes-does-the-sort-order-matter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!