Random record from MongoDB

后端 未结 27 1891
栀梦
栀梦 2020-11-22 01:22

I am looking to get a random record from a huge (100 million record) mongodb.

What is the fastest and most efficient way to do so? The data is already t

相关标签:
27条回答
  • 2020-11-22 01:33

    The following recipe is a little slower than the mongo cookbook solution (add a random key on every document), but returns more evenly distributed random documents. It's a little less-evenly distributed than the skip( random ) solution, but much faster and more fail-safe in case documents are removed.

    function draw(collection, query) {
        // query: mongodb query object (optional)
        var query = query || { };
        query['random'] = { $lte: Math.random() };
        var cur = collection.find(query).sort({ rand: -1 });
        if (! cur.hasNext()) {
            delete query.random;
            cur = collection.find(query).sort({ rand: -1 });
        }
        var doc = cur.next();
        doc.random = Math.random();
        collection.update({ _id: doc._id }, doc);
        return doc;
    }
    

    It also requires you to add a random "random" field to your documents so don't forget to add this when you create them : you may need to initialize your collection as shown by Geoffrey

    function addRandom(collection) { 
        collection.find().forEach(function (obj) {
            obj.random = Math.random();
            collection.save(obj);
        }); 
    } 
    db.eval(addRandom, db.things);
    

    Benchmark results

    This method is much faster than the skip() method (of ceejayoz) and generates more uniformly random documents than the "cookbook" method reported by Michael:

    For a collection with 1,000,000 elements:

    • This method takes less than a millisecond on my machine

    • the skip() method takes 180 ms on average

    The cookbook method will cause large numbers of documents to never get picked because their random number does not favor them.

    • This method will pick all elements evenly over time.

    • In my benchmark it was only 30% slower than the cookbook method.

    • the randomness is not 100% perfect but it is very good (and it can be improved if necessary)

    This recipe is not perfect - the perfect solution would be a built-in feature as others have noted.
    However it should be a good compromise for many purposes.

    0 讨论(0)
  • 2020-11-22 01:33

    If you have a simple id key, you could store all the id's in an array, and then pick a random id. (Ruby answer):

    ids = @coll.find({},fields:{_id:1}).to_a
    @coll.find(ids.sample).first
    
    0 讨论(0)
  • 2020-11-22 01:33

    If you're using mongoid, the document-to-object wrapper, you can do the following in Ruby. (Assuming your model is User)

    User.all.to_a[rand(User.count)]
    

    In my .irbrc, I have

    def rando klass
        klass.all.to_a[rand(klass.count)]
    end
    

    so in rails console, I can do, for example,

    rando User
    rando Article
    

    to get documents randomly from any collection.

    0 讨论(0)
  • 2020-11-22 01:34

    If you are using mongoose then you may use mongoose-random mongoose-random

    0 讨论(0)
  • 2020-11-22 01:36

    You can also use MongoDB's geospatial indexing feature to select the documents 'nearest' to a random number.

    First, enable geospatial indexing on a collection:

    db.docs.ensureIndex( { random_point: '2d' } )
    

    To create a bunch of documents with random points on the X-axis:

    for ( i = 0; i < 10; ++i ) {
        db.docs.insert( { key: i, random_point: [Math.random(), 0] } );
    }
    

    Then you can get a random document from the collection like this:

    db.docs.findOne( { random_point : { $near : [Math.random(), 0] } } )
    

    Or you can retrieve several document nearest to a random point:

    db.docs.find( { random_point : { $near : [Math.random(), 0] } } ).limit( 4 )
    

    This requires only one query and no null checks, plus the code is clean, simple and flexible. You could even use the Y-axis of the geopoint to add a second randomness dimension to your query.

    0 讨论(0)
  • 2020-11-22 01:38

    In order to get a determinated number of random docs without duplicates:

    1. first get all ids
    2. get size of documents
    3. loop geting random index and skip duplicated

      number_of_docs=7
      db.collection('preguntas').find({},{_id:1}).toArray(function(err, arr) {
      count=arr.length
      idsram=[]
      rans=[]
      while(number_of_docs!=0){
          var R = Math.floor(Math.random() * count);
          if (rans.indexOf(R) > -1) {
           continue
            } else {           
                     ans.push(R)
                     idsram.push(arr[R]._id)
                     number_of_docs--
                      }
          }
      db.collection('preguntas').find({}).toArray(function(err1, doc1) {
                      if (err1) { console.log(err1); return;  }
                     res.send(doc1)
                  });
              });
      
    0 讨论(0)
提交回复
热议问题