Should I implement auto-incrementing in MongoDB?

I'm making the switch to MongoDB from MySQL. A familiar architecture to me for a very basic users table would have auto-incrementing of the uid. See Mongo's own documentation for this use case.

I'm wondering whether this is the best architectural decision. From a UX standpoint, I like having UIDs as external references, for example in shorter URLs: http://example.com/users/12345

Is there a third way? Someone in IRC Freenode's #mongodb suggested creating a range of IDs and caching them. I'm unsure of how to actually implement that, or whether there's another route I can go. I don't necessarily even need the _id itself to be incremented this way. As long as the users all have a unique numerical uid within the document, I would be happy.

Josh, No auto-increment id in MongoDB and there are good reasons. I would say go with ObjectIds which are unique in the cluster.

You can add auto increment by a sequence collection and using findAndModify to get the next id to use. This will definitely add complexities to your application and may also affect the ability to shard your database.

As long as you can guarantee that your generated ids will be unique, you will be fine. But the headache will be there.

You can look at this post for more info about this question in the dedicated google group for MongoDB:

http://groups.google.com/group/mongodb-user/browse_thread/thread/f57b712b2aae6f0b/b4315285e689b9a7?lnk=gst&q=projapati#b4315285e689b9a7

Hope this helps.

Thanks

expert

I strongly disagree with author of selected answer that No auto-increment id in MongoDB and there are good reasons. We don't know reasons why 10gen didn't encourage usage of auto-incremented IDs. It's speculation. I think 10gen made this choice because it's just easier to ensure uniqueness of 12-byte IDs in clustered environment. It's default solution that fits most newcomers therefore increases product adoption which is good for 10gen's business.

Now let me tell everyone about my experience with ObjectIds in commercial environment.

I'm building social network. We have roughly 6M users and each user has roughly 20 friends.

Now imagine we have a collection which stores relationship between users (who follows who). It looks like this

_id : ObjectId
user_id : ObjectId
followee_id : ObjectId

on which we have unique composite index {user_id, followee_id}. We can estimate size of this index to be 12*2*6M*20 = 2GB. Now that's index for fast look-up of people I follow. For fast look-up of people that follow me I need reverse index. That's another 2GB.

And this is just the beginning. I have to carry these IDs everywhere. We have activity cluster where we store your News Feed. That's every event you or your friends do. Imagine how much space it takes.

And finally one of our engineers made an unconscious decision and decided to store references as strings that represent ObjectId which doubles its size.

What happens if an index does not fit into RAM? Nothing good, says 10gen:

When an index is too large to fit into RAM, MongoDB must read the index from disk, which is a much slower operation than reading from RAM. Keep in mind an index fits into RAM when your server has RAM available for the index combined with the rest of the working set.

That means reads are slow. Lock contention goes up. Writes gets slower as well. Seeing lock contention in 80%-nish is no longer shock to me.

Before you know it you ended up with 460GB cluster which you have to split to shards and which is quite hard to manipulate.

Facebook uses 64-bit long as user id :) There is a reason for that. You can generate sequential IDs

using 10gen's advice.
using mysql as storage of counters (if you concerned about speed take a look at handlersocket)
using ID generating service you built or using something like Snowflake by Twitter.

So here is my general advice to everyone. Please please make your data as small as possible. When you grow it will save you lots of sleepless nights.

So, there's a fundamental problem with "auto-increment" IDs. When you have 10 different servers (shards in MongoDB), who picks the next ID?

If you want a single set of auto-incrementing IDs, you have to have a single authority for picking those IDs. In MySQL, this is generally pretty easy as you just have one server accepting writes. But big deployments of MongoDB are running sharding which doesn't have this "central authority".

MongoDB, uses 12-byte ObjectIds so that each server can create new documents uniquely without relying on a single authority.

So here's the big question: "can you afford to have a single authority"?

If so, then you can use findAndModify to keep track of the "last highest ID" and then you can insert with that.

That's the process described in your link. The obvious weakness here is that you technically have to do two writes for each insert. This may not scale very well, you probably want to avoid it on data with a high insertion rate. It may work for users, it probably won't work for tracking clicks.

There is nothing like an auto-increment in MongoDB but you may store your own counters in a dedicated collection and $inc the related value of counter as needed. Since $inc is an atomic operation you won't see duplicates.

The default Mongo ObjectId -- the one used in the _id field -- is incrementing.

Mongo uses a timestamp ( seconds since the Unix epoch) as the first 4-byte portion of its 4-3-2-3 composition, very similar (if not exactly) the same composition as a Version 1 UUID. And that ObjectId is generated at time of insert (if no other type of _id is provided by the user/client)

Thus the ObjectId is ordinal in nature; further, the default sort is based on this incrementing timestamp.

One might consider it an updated version of the auto-incrementing (index++) ids used in many dbms.

来源：https://stackoverflow.com/questions/6645277/should-i-implement-auto-incrementing-in-mongodb

标签

auto-increment

uuid

mongodb