I\'ve recently discovered that Mongo has no SQL equivalent to \"ORDER BY RAND()\" in it\'s command syntax (https://jira.mongodb.org/browse/SERVER-533)
I\'ve seen the
What you want cannot be done without picking either of the two solutions you mention. Picking a random offset is a horrible idea if your collection becomes larger than a few thousands documents. The reason for this is that the skip(n) operation takes O(n) time. In other words, the higher your random offset the longer the query will take.
Adding a randomized field to the document is, in my opinion, the least hacky solution there is given the current feature set of MongoDB. It provides stable query times and gives you some say over how the collection is randomized (and allows you to generate a new random value after each query through a findAndModify for example). I also do not understand how this would impose an implicit limit on your queries that make use of randomization.
One could insert an id field (the $id field won't work because its not an actual number) use modulus math to get a random skip. If you have 10,000 records and you wanted 10 results you could pick a modulus between 1 and 1000 randomly sucH as 253 and then request where mod(id,253)=0 and this is reasonably fast if id is indexed. Then randomly sort client side those 10 results. Sure they are evenly spaced out instead of truly random, but it close to what is desired.
The other widely given suggestion is to choose a random index to offset from. Because of the order that my documents were inserted in, that will result in one of the string fields being alphabetized, which won't feel very random to a user of my site.
Why? If you have 7.000 documents and you choose three random offsets from 0 to 6999, the chosen documents will be random, even if the collection itself is sorted alphabetically.
I have to agree: the easiest thing to do is to install a random value into your documents. There need not be a tremendously large range of values, either -- the number you choose depends on the expected result size for your queries (1,000 - 1,000,000 distinct integers ought to be enough for most cases).
When you run your query, don't worry about the random field -- instead, index it and use it to sort. Since there is no correspondence between the random number and the document, you should get fairly random results. Note that collisions will likely result in documents being returned in natural order.
While this is certainly a hack, you have a very easy escape route: given MongoDB's schema-free nature, you can simply stop including the random field once there is support for random sort in the server. If size is an issue, you could run a batch job to remove the field from existing documents. There shouldn't be a significant change in your client code if you design it carefully.
An alternative option would be to think long and hard about the number of results that will be randomized and returned for a given query. It may not be overly expensive to simply do shuffling in client code (i.e., if you only consider the most recent 10,000 posts).
Both of the options seems like non-perfect hacks to me, random filed and will always have same value and skip will return same records for a same number.
Why don't you use some random field to sort then skip randomly, i admit it is also a hack but in my experience gives better sense of randomness.
You can give this a try - it's fast, works with multiple documents and doesn't require populating rand
field at the beginning, which will eventually populate itself:
// Install packages:
// npm install mongodb async
// Add index in mongo:
// db.ensureIndex('mycollection', { rand: 1 })
var mongodb = require('mongodb')
var async = require('async')
// Find n random documents by using "rand" field.
function findAndRefreshRand (collection, n, fields, done) {
var result = []
var rand = Math.random()
// Append documents to the result based on criteria and options, if options.limit is 0 skip the call.
var appender = function (criteria, options, done) {
return function (done) {
if (options.limit > 0) {
collection.find(criteria, fields, options).toArray(
function (err, docs) {
if (!err && Array.isArray(docs)) {
Array.prototype.push.apply(result, docs)
}
done(err)
}
)
} else {
async.nextTick(done)
}
}
}
async.series([
// Fetch docs with unitialized .rand.
// NOTE: You can comment out this step if all docs have initialized .rand = Math.random()
appender({ rand: { $exists: false } }, { limit: n - result.length }),
// Fetch on one side of random number.
appender({ rand: { $gte: rand } }, { sort: { rand: 1 }, limit: n - result.length }),
// Continue fetch on the other side.
appender({ rand: { $lt: rand } }, { sort: { rand: -1 }, limit: n - result.length }),
// Refresh fetched docs, if any.
function (done) {
if (result.length > 0) {
var batch = collection.initializeUnorderedBulkOp({ w: 0 })
for (var i = 0; i < result.length; ++i) {
batch.find({ _id: result[i]._id }).updateOne({ rand: Math.random() })
}
batch.execute(done)
} else {
async.nextTick(done)
}
}
], function (err) {
done(err, result)
})
}
// Example usage
mongodb.MongoClient.connect('mongodb://localhost:27017/core-development', function (err, db) {
if (!err) {
findAndRefreshRand(db.collection('profiles'), 1024, { _id: true, rand: true }, function (err, result) {
if (!err) {
console.log(result)
} else {
console.error(err)
}
db.close()
})
} else {
console.error(err)
}
})