Random record from MongoDB

后端 未结 27 1965
栀梦
栀梦 2020-11-22 01:22

I am looking to get a random record from a huge (100 million record) mongodb.

What is the fastest and most efficient way to do so? The data is already t

27条回答
  •  北荒
    北荒 (楼主)
    2020-11-22 01:43

    Here is a way using the default ObjectId values for _id and a little math and logic.

    // Get the "min" and "max" timestamp values from the _id in the collection and the 
    // diff between.
    // 4-bytes from a hex string is 8 characters
    
    var min = parseInt(db.collection.find()
            .sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
        max = parseInt(db.collection.find()
            .sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
        diff = max - min;
    
    // Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
    var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;
    
    // Use "random" in the range and pad the hex string to a valid ObjectId
    var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")
    
    // Then query for the single document:
    var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
       .sort({ "_id": 1 }).limit(1).toArray()[0];
    

    That's the general logic in shell representation and easily adaptable.

    So in points:

    • Find the min and max primary key values in the collection

    • Generate a random number that falls between the timestamps of those documents.

    • Add the random number to the minimum value and find the first document that is greater than or equal to that value.

    This uses "padding" from the timestamp value in "hex" to form a valid ObjectId value since that is what we are looking for. Using integers as the _id value is essentially simplier but the same basic idea in the points.

提交回复
热议问题