Random record from MongoDB

后端 未结 27 1913
栀梦
栀梦 2020-11-22 01:22

I am looking to get a random record from a huge (100 million record) mongodb.

What is the fastest and most efficient way to do so? The data is already t

27条回答
  •  无人共我
    2020-11-22 01:33

    The following recipe is a little slower than the mongo cookbook solution (add a random key on every document), but returns more evenly distributed random documents. It's a little less-evenly distributed than the skip( random ) solution, but much faster and more fail-safe in case documents are removed.

    function draw(collection, query) {
        // query: mongodb query object (optional)
        var query = query || { };
        query['random'] = { $lte: Math.random() };
        var cur = collection.find(query).sort({ rand: -1 });
        if (! cur.hasNext()) {
            delete query.random;
            cur = collection.find(query).sort({ rand: -1 });
        }
        var doc = cur.next();
        doc.random = Math.random();
        collection.update({ _id: doc._id }, doc);
        return doc;
    }
    

    It also requires you to add a random "random" field to your documents so don't forget to add this when you create them : you may need to initialize your collection as shown by Geoffrey

    function addRandom(collection) { 
        collection.find().forEach(function (obj) {
            obj.random = Math.random();
            collection.save(obj);
        }); 
    } 
    db.eval(addRandom, db.things);
    

    Benchmark results

    This method is much faster than the skip() method (of ceejayoz) and generates more uniformly random documents than the "cookbook" method reported by Michael:

    For a collection with 1,000,000 elements:

    • This method takes less than a millisecond on my machine

    • the skip() method takes 180 ms on average

    The cookbook method will cause large numbers of documents to never get picked because their random number does not favor them.

    • This method will pick all elements evenly over time.

    • In my benchmark it was only 30% slower than the cookbook method.

    • the randomness is not 100% perfect but it is very good (and it can be improved if necessary)

    This recipe is not perfect - the perfect solution would be a built-in feature as others have noted.
    However it should be a good compromise for many purposes.

提交回复
热议问题