In MongoDB mapreduce, how can I flatten the values object?

前端未结

关注

 7  2148

I\'m trying to use MongoDB to analyse Apache log files. I\'ve created a receipts collection from the Apache access logs. Here\'s an abridged summary of what my

相关标签:

7条回答

生来不讨喜

2020-12-04 22:48

While experimenting with Vincent's answer, I found a couple of problems. Basically, if you perform updates within a foreach loop, this will move the document to the end of the collection and the cursor will reach that document again (example). This can be circumvented if $snapshot is used. Hence, I am providing a Java example below.

final List<WriteModel<Document>> bulkUpdate = new ArrayList<>();

// You should enable $snapshot if performing updates within foreach
collection.find(new Document().append("$query", new Document()).append("$snapshot", true)).forEach(new Block<Document>() {
    @Override
    public void apply(final Document document) {
        // Note that I used incrementing long values for '_id'. Change to String if
        // you used string '_id's
        long docId = document.getLong("_id");
        Document subDoc = (Document)document.get("value");
        WriteModel<Document> m = new ReplaceOneModel<>(new Document().append("_id", docId), subDoc);
        bulkUpdate.add(m);

        // If you used non-incrementing '_id's, then you need to use a final object with a counter.
        if(docId % 1000 == 0 && !bulkUpdate.isEmpty()) {
            collection.bulkWrite(bulkUpdate);
            bulkUpdate.removeAll(bulkUpdate);
        }
    }
});
// Fixing bug related to Vincent's answer.
if(!bulkUpdate.isEmpty()) {
    collection.bulkWrite(bulkUpdate);
    bulkUpdate.removeAll(bulkUpdate);
}

Note : This snippet takes an average of 7.4 seconds to execute on my machine with 100k records and 14 attributes (IMDB dataset). Without batching, it takes an average of 25.2 seconds.

0 讨论(0)

上一页 1 2