How can I improve MongoDB bulk performance?

耗尽温柔 提交于 2019-12-20 02:39:53

问题


I have this object with some metadata and a big array of items. I used to store this in mongo, and querying it by $unwinding the array. However, in extreme cases, the array becomes so big that I run into 16MB BSON limitations.

So I need to store each element of the array as a separate document. For that I need to add the metadata to all of them, so I can find them back. It is suggested that I use bulk operations for this.

However, performance seems to be really slow. Inserting one big document was near-instant, and this takes up to ten seconds.

var bulk        = col.initializeOrderedBulkOp();
var metaData    = {
    hash            : hash,
    date            : timestamp,
    name            : name
};

// measure time here

for (var i = 0, l = array.length; i < l; i++) { // 6000 items
    var item = array[i];

    bulk.insert({ // Apparently, this 6000 times takes 2.9 seconds
        data        : item,
        metaData    : metaData
    });

}

bulk.execute(bulkOpts, function(err, result) { // and this takes 6.5 seconds
    // measure time here
});

Bulk inserting 6000 documents totalling 38 MB worth of data (which translates to 49 MB as BSON in MongoDB), performance seems unacceptably bad. The overhead of appending metadata to every document can't be that bad, right? The overhead of updating two indexes can't be that bad, right?

Am I missing something? Is there a better way of inserting groups of documents that need to be fetched as a group?

It's not just my laptop. Same on the server. Makes me think this is not a configuration error, rather a programming error.

Using MongoDB 2.6.11 with node adapter node-mongodb-native 2.0.49

-update-

Just the act of adding the metadata to every element in the bulk accounts for 2.9 seconds. There needs to be a better way of doing this.


回答1:


Send the bulk insert operations in batches as this results in less traffic to the server and thus performs efficient wire transactions by not sending everything all in individual statements, but rather breaking up into manageable chunks for server commitment. There is also less time waiting for the response in the callback with this approach.

A much better approach with this would be using the async module so even looping the input list is a non-blocking operation. Choosing the batch size can vary, but selecting batch insert operations per 1000 entries would make it safe to stay under the 16MB BSON hard limit, as the whole "request" is equal to one BSON document.

The following demonstrates using the async module's whilst to iterate through the array and repeatedly call the iterator function, while test returns true. Calls callback when stopped, or when an error occurs.

var bulk = col.initializeOrderedBulkOp(),
    counter = 0,
    len = array.length,
    buildModel = function(index){   
        return {
            "data": array[index],
            "metaData": {
                "hash": hash,
                "date": timestamp,
                "name": name
            }
        }
    };

async.whilst(
    // Iterator condition
    function() { return counter < len },

    // Do this in the iterator
    function (callback) {
        counter++;
        var model = buildModel(counter);
        bulk.insert(model);

        if (counter % 1000 == 0) {
            bulk.execute(function(err, result) {
                bulk = col.initializeOrderedBulkOp();
                callback(err);
            });
        } else {
            callback();
        }
    },

    // When all is done
    function(err) {
        if (counter % 1000 != 0) {
            bulk.execute(function(err, result) {
                console.log("More inserts.");
            }); 
        }           
        console.log("All done now!");
    }
);


来源:https://stackoverflow.com/questions/33843863/how-can-i-improve-mongodb-bulk-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!