Random record from MongoDB

后端 未结 27 1962
栀梦
栀梦 2020-11-22 01:22

I am looking to get a random record from a huge (100 million record) mongodb.

What is the fastest and most efficient way to do so? The data is already t

相关标签:
27条回答
  • 2020-11-22 01:50

    Update for MongoDB 3.2

    3.2 introduced $sample to the aggregation pipeline.

    There's also a good blog post on putting it into practice.

    For older versions (previous answer)

    This was actually a feature request: http://jira.mongodb.org/browse/SERVER-533 but it was filed under "Won't fix."

    The cookbook has a very good recipe to select a random document out of a collection: http://cookbook.mongodb.org/patterns/random-attribute/

    To paraphrase the recipe, you assign random numbers to your documents:

    db.docs.save( { key : 1, ..., random : Math.random() } )
    

    Then select a random document:

    rand = Math.random()
    result = db.docs.findOne( { key : 2, random : { $gte : rand } } )
    if ( result == null ) {
      result = db.docs.findOne( { key : 2, random : { $lte : rand } } )
    }
    

    Querying with both $gte and $lte is necessary to find the document with a random number nearest rand.

    And of course you'll want to index on the random field:

    db.docs.ensureIndex( { key : 1, random :1 } )
    

    If you're already querying against an index, simply drop it, append random: 1 to it, and add it again.

    0 讨论(0)
  • 2020-11-22 01:51

    non of the solutions worked well for me. especially when there are many gaps and set is small. this worked very well for me(in php):

    $count = $collection->count($search);
    $skip = mt_rand(0, $count - 1);
    $result = $collection->find($search)->skip($skip)->limit(1)->getNext();
    
    0 讨论(0)
  • 2020-11-22 01:51

    This works nice, it's fast, works with multiple documents and doesn't require populating rand field, which will eventually populate itself:

    1. add index to .rand field on your collection
    2. use find and refresh, something like:
    // Install packages:
    //   npm install mongodb async
    // Add index in mongo:
    //   db.ensureIndex('mycollection', { rand: 1 })
    
    var mongodb = require('mongodb')
    var async = require('async')
    
    // Find n random documents by using "rand" field.
    function findAndRefreshRand (collection, n, fields, done) {
      var result = []
      var rand = Math.random()
    
      // Append documents to the result based on criteria and options, if options.limit is 0 skip the call.
      var appender = function (criteria, options, done) {
        return function (done) {
          if (options.limit > 0) {
            collection.find(criteria, fields, options).toArray(
              function (err, docs) {
                if (!err && Array.isArray(docs)) {
                  Array.prototype.push.apply(result, docs)
                }
                done(err)
              }
            )
          } else {
            async.nextTick(done)
          }
        }
      }
    
      async.series([
    
        // Fetch docs with unitialized .rand.
        // NOTE: You can comment out this step if all docs have initialized .rand = Math.random()
        appender({ rand: { $exists: false } }, { limit: n - result.length }),
    
        // Fetch on one side of random number.
        appender({ rand: { $gte: rand } }, { sort: { rand: 1 }, limit: n - result.length }),
    
        // Continue fetch on the other side.
        appender({ rand: { $lt: rand } }, { sort: { rand: -1 }, limit: n - result.length }),
    
        // Refresh fetched docs, if any.
        function (done) {
          if (result.length > 0) {
            var batch = collection.initializeUnorderedBulkOp({ w: 0 })
            for (var i = 0; i < result.length; ++i) {
              batch.find({ _id: result[i]._id }).updateOne({ rand: Math.random() })
            }
            batch.execute(done)
          } else {
            async.nextTick(done)
          }
        }
    
      ], function (err) {
        done(err, result)
      })
    }
    
    // Example usage
    mongodb.MongoClient.connect('mongodb://localhost:27017/core-development', function (err, db) {
      if (!err) {
        findAndRefreshRand(db.collection('profiles'), 1024, { _id: true, rand: true }, function (err, result) {
          if (!err) {
            console.log(result)
          } else {
            console.error(err)
          }
          db.close()
        })
      } else {
        console.error(err)
      }
    })
    

    ps. How to find random records in mongodb question is marked as duplicate of this question. The difference is that this question asks explicitly about single record as the other one explicitly about getting random documents.

    0 讨论(0)
  • 2020-11-22 01:51

    you can also use shuffle-array after executing your query

    var shuffle = require('shuffle-array');

    Accounts.find(qry,function(err,results_array){ newIndexArr=shuffle(results_array);

    0 讨论(0)
  • 2020-11-22 01:51

    What works efficiently and reliably is this:

    Add a field called "random" to each document and assign a random value to it, add an index for the random field and proceed as follows:

    Let's assume we have a collection of web links called "links" and we want a random link from it:

    link = db.links.find().sort({random: 1}).limit(1)[0]
    

    To ensure the same link won't pop up a second time, update its random field with a new random number:

    db.links.update({random: Math.random()}, link)
    
    0 讨论(0)
  • 2020-11-22 01:53

    You can pick random _id and return corresponding object:

     db.collection.count( function(err, count){
            db.collection.distinct( "_id" , function( err, result) {
                if (err)
                    res.send(err)
                var randomId = result[Math.floor(Math.random() * (count-1))]
                db.collection.findOne( { _id: randomId } , function( err, result) {
                    if (err)
                        res.send(err)
                    console.log(result)
                })
            })
        })
    

    Here you dont need to spend space on storing random numbers in collection.

    0 讨论(0)
提交回复
热议问题