We are migrating a database from MySQL to MongoDB for performance reasons and considering what to use for IDs of the MongoDB documents. We are debating between using ObjectIDs,
Consider the amount of data you would store in each case.
A MongoDB ObjectID is 12 bytes in size, is packed for storage, and its parts are organized for performance (i.e. timestamp is stored first, which is a logical ordering criteria).
Conversely, a standard UUID is 36 bytes, contains dashes and is typically stored as a string. Further, even if you strip non-numeric characters and intend to store numerically, you must still content with its "indexy" portion (the part of a UUID v1 that is timestamp-based) is in the middle of the UUID, and doesn't lend itself well to sorting. There are studies done which allow for performant UUID storage, and I even wrote a Node.js library to assist in its management.
If you're intend on using a UUID, consider reorganizing it for optimal indexing and sorting; otherwise you'll likely hit a performance wall.
The _id
field of MongoDB can have any value you want as long as you can guarantee that it is unique for the collection. When your data already has a natural key, there is no reason not to use this in place of the auto-generated ObjectIDs.
ObjectIDs are provided as a reasonable default solution to safe time generating an own unique key (and to discourage beginners from trying to copy SQL's AUTO INCREMENT
which is a bad idea in a distributed database).
By not using ObjectIDs you also miss out on another convenience feature: An ObjectID also includes an unix timestamp when it was generated, and many drivers provide a funtion to extract it and convert it to a date. This can sometimes make a separate create-date
field redundant.
But when neither is a concern for you, you are free to use your UUIDs as _id
field.
I found these Benchmarks sometime ago when I had the same question. They basically show that using a Guid instead of ObjectId causes Index Performance drop.
I would anyways recommend that you customize the Benchmarks to imitate your specific real life scenario and see how the numbers look like, one cannot rely 100% on generic Benchmarks.
We must be careful to distinguish the cost of MongoDB inserting a thing vs. the cost to generate the thing in the first place plus that cost relative to the size of the payload. Below is a little matrix that shows method of generating the _id
crossed against the size of an optional extra bytes worth of payload. Tests are using javascript only, conducted on MacBook Pro localhost for 100,000 inserts using insertMany
of batches of 100 without transactions to try to remove network, chatty, and other factors. Two runs with batch = 1 were also done just to highlight the dramatic difference.
Method
A : Simple int: _id:0, _id:1, ...
B : ObjectId _id:ObjectId("5e0e6a804888946fa61a1976"), ...
C : Simple string: _id:"A0", _id:"A1", ...
D : UUID length string _id:"9575edcc-cb70-4d63-97ed-ee5d624de87b0", ...
(but not actually
generated by UUID()
E : Real generated UUID _id: UUID("35992974-21ea-4f61-b715-2dfaed663b73"), ...
(stored UUID() object)
F : Real generated UUID _id: "6b16f733-ff24-4172-83f9-e4f96ace6775"
(stored as string, e.g.
UUID().toString().substr(6,36)
Time in milliseconds to perform 100,000 inserts on fresh (empty) collection.
Extra M E T H O D (Batch = 100)
Payload A B C D E F % drop A to F
-------- ---- ---- ---- ---- ---- ---- ------------
None 2379 2386 2418 2492 3472 4267 80%
512 2934 2928 3048 3128 4151 4870 66%
1024 3249 3309 3375 3390 4847 5237 61%
2048 3953 3832 3987 4342 5448 5888 49%
4096 6299 6343 6199 6449 7634 8640 37%
8192 9716 9292 9397 10816 11212 11321 16%
Extra M E T H O D (Batch = 1)
Payload A B C D E F % drop A to F
-------- ----- ----- ----- ----- ----- -----
None 48006 48419 49136 48757 50649 51280 6.8%
1024 50986 50894 49383 49373 51200 51821 1.2%
This was a quicky test but it seems clear that basic strings and ints as _id
are roughly the same speed but actually generating a UUID adds time -- especially if you take the string version of the UUID()
object, e.g. UUID().toString().substr(6,36)
It is also worth noting that constructing an ObjectId
appears to be as quick.
I think this is a great idea and so does Mongo; they list UUIDs as one of the common options for the _id field.
Considerations:
Counter to some of the other answers:
ObjectID()
; to convert a string into equivalent BSON object.0x04
.)UUID()
function only generates v4 (random) UUIDs so, to leverage this this, you'd to lean on on your app or Mongo driver for ID creation.