In MongoDB mapreduce, how can I flatten the values object?

前端 未结 7 2148
挽巷
挽巷 2020-12-04 22:22

I\'m trying to use MongoDB to analyse Apache log files. I\'ve created a receipts collection from the Apache access logs. Here\'s an abridged summary of what my

相关标签:
7条回答
  • 2020-12-04 22:48

    While experimenting with Vincent's answer, I found a couple of problems. Basically, if you perform updates within a foreach loop, this will move the document to the end of the collection and the cursor will reach that document again (example). This can be circumvented if $snapshot is used. Hence, I am providing a Java example below.

    final List<WriteModel<Document>> bulkUpdate = new ArrayList<>();
    
    // You should enable $snapshot if performing updates within foreach
    collection.find(new Document().append("$query", new Document()).append("$snapshot", true)).forEach(new Block<Document>() {
        @Override
        public void apply(final Document document) {
            // Note that I used incrementing long values for '_id'. Change to String if
            // you used string '_id's
            long docId = document.getLong("_id");
            Document subDoc = (Document)document.get("value");
            WriteModel<Document> m = new ReplaceOneModel<>(new Document().append("_id", docId), subDoc);
            bulkUpdate.add(m);
    
            // If you used non-incrementing '_id's, then you need to use a final object with a counter.
            if(docId % 1000 == 0 && !bulkUpdate.isEmpty()) {
                collection.bulkWrite(bulkUpdate);
                bulkUpdate.removeAll(bulkUpdate);
            }
        }
    });
    // Fixing bug related to Vincent's answer.
    if(!bulkUpdate.isEmpty()) {
        collection.bulkWrite(bulkUpdate);
        bulkUpdate.removeAll(bulkUpdate);
    }
    

    Note : This snippet takes an average of 7.4 seconds to execute on my machine with 100k records and 14 attributes (IMDB dataset). Without batching, it takes an average of 25.2 seconds.

    0 讨论(0)
提交回复
热议问题