I need to store several billion small data structures (around 200 bytes each). So far, storing each element as a separate document is working well, with Mongo providing around
A few strategies come to mind:
1) Use a distinct collection/database for the 'hot' documents.
If you know which documents are in the hot set then, yes, moving them into a separate collection will help. This will ensure that the hot documents are co-resident on the same extents/pages. It will also make the index for those documents more likely to be completely in memory. This is due to it being smaller and being (completely?) used more often.
If the hot documents are randomly mixed with other documents then you will likely have to fault in more of the leaf elements of the B-Tree index when loading a document as the probability of another document having recently loaded or accessed the index block is small.
2) Shorten the indexed values.
The shorter the index value the more values that fit into a single B-Tree block. (Note: The keys are not included in the index.) The more entries in a single bucket means fewer buckets and less total memory needed for the index. That translates to the higher probability / longer lifetimes that blocks will stay in memory. In your example a 20->8 character reduction is a better than 50% savings. If you can convert those 8 bytes to a long there is a little more savings since longs do not have a length prefix (4 bytes) and a trailing null (5 bytes total).
3) Shorten the key names.
The shorter the field names the less space each document takes. This has the unfortunate side effect of decreasing readability.
4) Shard
This is really the only way to keep performance up in the face of reads across an entire corpus that exhausts memory and eventual disk bandwidth. If you do shard you will still want to shard the 'hot' collection.
5) Adjust the read-ahead on disk to a small value.
Since the 'non-hot' reads are loading a random document from disk we really only want to read/fault into memory that document and as few of the documents around it as possible. Most systems will try and read-ahead a large block of data once a user reads from a portion of a file. This is exactly the opposite of what we want.
If you see your system faulting a lot but the resident memory for the mongod process does not approach the systems available memory you are likely seeing the effect of the OS reading useless data.
6) Try to use monotonically increasing values for the keys.
This will trigger an optimization (for ObjectId based indexes) that when the index block splits it will do so at 90/10 instead of 50/50. The result is that most of the blocks in your index will be near capacity and you will need fewer of them.
If you only know the 'hot' 50,000 documents after the fact then adding them to the separate collection in index order will also trigger this optimization.
Rob.