I am looking to get a random record from a huge (100 million record) mongodb
.
What is the fastest and most efficient way to do so? The data is already t
The following recipe is a little slower than the mongo cookbook solution (add a random key on every document), but returns more evenly distributed random documents. It's a little less-evenly distributed than the skip( random )
solution, but much faster and more fail-safe in case documents are removed.
function draw(collection, query) {
// query: mongodb query object (optional)
var query = query || { };
query['random'] = { $lte: Math.random() };
var cur = collection.find(query).sort({ rand: -1 });
if (! cur.hasNext()) {
delete query.random;
cur = collection.find(query).sort({ rand: -1 });
}
var doc = cur.next();
doc.random = Math.random();
collection.update({ _id: doc._id }, doc);
return doc;
}
It also requires you to add a random "random" field to your documents so don't forget to add this when you create them : you may need to initialize your collection as shown by Geoffrey
function addRandom(collection) {
collection.find().forEach(function (obj) {
obj.random = Math.random();
collection.save(obj);
});
}
db.eval(addRandom, db.things);
Benchmark results
This method is much faster than the skip()
method (of ceejayoz) and generates more uniformly random documents than the "cookbook" method reported by Michael:
For a collection with 1,000,000 elements:
This method takes less than a millisecond on my machine
the skip()
method takes 180 ms on average
The cookbook method will cause large numbers of documents to never get picked because their random number does not favor them.
This method will pick all elements evenly over time.
In my benchmark it was only 30% slower than the cookbook method.
the randomness is not 100% perfect but it is very good (and it can be improved if necessary)
This recipe is not perfect - the perfect solution would be a built-in feature as others have noted.
However it should be a good compromise for many purposes.
If you have a simple id key, you could store all the id's in an array, and then pick a random id. (Ruby answer):
ids = @coll.find({},fields:{_id:1}).to_a
@coll.find(ids.sample).first
If you're using mongoid, the document-to-object wrapper, you can do the following in Ruby. (Assuming your model is User)
User.all.to_a[rand(User.count)]
In my .irbrc, I have
def rando klass
klass.all.to_a[rand(klass.count)]
end
so in rails console, I can do, for example,
rando User
rando Article
to get documents randomly from any collection.
If you are using mongoose then you may use mongoose-random mongoose-random
You can also use MongoDB's geospatial indexing feature to select the documents 'nearest' to a random number.
First, enable geospatial indexing on a collection:
db.docs.ensureIndex( { random_point: '2d' } )
To create a bunch of documents with random points on the X-axis:
for ( i = 0; i < 10; ++i ) {
db.docs.insert( { key: i, random_point: [Math.random(), 0] } );
}
Then you can get a random document from the collection like this:
db.docs.findOne( { random_point : { $near : [Math.random(), 0] } } )
Or you can retrieve several document nearest to a random point:
db.docs.find( { random_point : { $near : [Math.random(), 0] } } ).limit( 4 )
This requires only one query and no null checks, plus the code is clean, simple and flexible. You could even use the Y-axis of the geopoint to add a second randomness dimension to your query.
In order to get a determinated number of random docs without duplicates:
loop geting random index and skip duplicated
number_of_docs=7
db.collection('preguntas').find({},{_id:1}).toArray(function(err, arr) {
count=arr.length
idsram=[]
rans=[]
while(number_of_docs!=0){
var R = Math.floor(Math.random() * count);
if (rans.indexOf(R) > -1) {
continue
} else {
ans.push(R)
idsram.push(arr[R]._id)
number_of_docs--
}
}
db.collection('preguntas').find({}).toArray(function(err1, doc1) {
if (err1) { console.log(err1); return; }
res.send(doc1)
});
});