MongoDB: how to find 10 random document in a collection of 100?

后端 未结 4 772
滥情空心
滥情空心 2020-12-15 21:00

Is MongoDB capable of funding number of random documents without making multiple queries?

e.g. I implemented on the JS side after loading all the document in the col

相关标签:
4条回答
  • 2020-12-15 21:30

    Since 3.2 there is an easier way to to get a random sample of documents from a collection:

    $sample New in version 3.2.

    Randomly selects the specified number of documents from its input.

    The $sample stage has the following syntax:

    { $sample: { size: <positive integer> } }

    Source: MongoDB Docs

    In this case:

    db.products.aggregate([{$sample: {size: 10}}]);
    
    0 讨论(0)
  • 2020-12-15 21:33

    Here is what I came up in the end:

    var numberOfItems = 10;
    
    
    // GET LIST OF ALL ID's
    SchemaNameHere.find({}, { '_id': 1 }, function(err, data) {
    
        if (err) res.send(err);
    
        // shuffle array, as per here  https://github.com/coolaj86/knuth-shuffle
        var arr = shuffle(data.slice(0));
    
        // get only the first numberOfItems of the shuffled array
        arr.splice(numberOfItems, arr.length - numberOfItems);
    
        // new array to store all items
        var return_arr = [];
    
        // use async each, as per here http://justinklemm.com/node-js-async-tutorial/
        async.each(arr, function(item, callback) {
    
            // get items 1 by 1 and add to the return_arr
            SchemaNameHere.findById(item._id, function(err, data) {
    
                if (err) res.send(err);
                return_arr.push(data);
    
                // go to the next one item, or to the next function if done
                callback();
    
            });
    
        }, function(err) {
    
            // run this when looped through all items in arr
            res.json(return_arr);
    
        });
    
    });
    
    0 讨论(0)
  • 2020-12-15 21:38

    This was answered long time ago and, since then, MongoDB has greatly evolved.

    As posted in another answer, MongoDB now supports sampling within the Aggregation Framework since version 3.2:

    The way you could do this is:

    db.products.aggregate([{$sample: {size: 5}}]); // You want to get 5 docs
    

    Or:

    db.products.aggregate([
      {$match: {category:"Electronic Devices"}}, // filter the results
      {$sample: {size: 5}} // You want to get 5 docs
    ]);
    

    However, there are some warnings about the $sample operator:

    (as of Nov, 6h 2017, where latest version is 3.4) => If any of this is not met:

    • $sample is the first stage of the pipeline
    • N is less than 5% of the total documents in the collection
    • The collection contains more than 100 documents

    If any of the above conditions are NOT met, $sample performs a collection scan followed by a random sort to select N documents.

    Like in the last example with the $match

    OLD ANSWER

    You could always run:

    db.products.find({category:"Electronic Devices"}).skip(Math.random()*YOUR_COLLECTION_SIZE)
    

    But the order won't be random and you will need two queries (one count to get YOUR_COLLECTION_SIZE) or estimate how big it is (it is about 100 records, about 1000, about 10000...)

    You could also add a field to all documents with a random number and query by that number. The drawback here would be that you will get the same results every time you run the same query. To fix that you can always play with limit and skip or even with sort. you could as well update those random numbers every time you fetch a record (implies more queries).

    --I don't know if you are using Mongoose, Mondoid or directly the Mongo Driver for any specific language, so I'll write all about mongo shell.

    Thus your, let's say, product record would look like this:

    {
     _id: ObjectId("..."),
     name: "Awesome Product",
     category: "Electronic Devices",
    }
    

    and I would suggest to use:

    {
     _id: ObjectId("..."),
     name: "Awesome Product",
     category: "Electronic Devices",
     _random_sample: Math.random()
    }
    

    Then you could do:

    db.products.find({category:"Electronic Devices",_random_sample:{$gte:Math.random()}})
    

    then, you could run periodically so you update the document's _random_sample field periodically:

    var your_query = {} //it would impact in your performance if there are a lot of records
    your_query = {category: "Electronic Devices"} //Update 
    //upsert = false, multi = true
    db.products.update(your_query,{$set:{_random_sample::Math.random()}},false,true)
    

    or just whenever you retrieve some records you could update all of them or just a few (depending on how many records you've retrieved)

    for(var i = 0; i < records.length; i++){
       var query = {_id: records[i]._id};
       //upsert = false, multi = false
       db.products.update(query,{$set:{_random_sample::Math.random()}},false,false);
    }
    

    EDIT

    Be aware that

    db.products.update(your_query,{$set:{_random_sample::Math.random()}},false,true)
    

    won't work very well as it will update every products that matches your query with the same random number. The last approach works better (updating some documents as you retrieve them)

    0 讨论(0)
  • 2020-12-15 21:54

    skip didn't work out for me. Here is what I wound up with:

    var randomDoc = db.getCollection("collectionName").aggregate([ {
        $match : {
    // criteria to filter matches
        }
    }, {
        $sample : {
            size : 1
        }
    } ]).result[0];
    

    gets a single random result, matching the criteria.

    0 讨论(0)
提交回复
热议问题