In MongoDB mapreduce, how can I flatten the values object?

前端 未结 7 2147
挽巷
挽巷 2020-12-04 22:22

I\'m trying to use MongoDB to analyse Apache log files. I\'ve created a receipts collection from the Apache access logs. Here\'s an abridged summary of what my

相关标签:
7条回答
  • 2020-12-04 22:27

    Taking the best from previous answers and comments:

    db.items.find().hint({_id: 1}).forEach(function(item) {
        db.items.update({_id: item._id}, item.value);
    });
    

    From http://docs.mongodb.org/manual/core/update/#replace-existing-document-with-new-document
    "If the update argument contains only field and value pairs, the update() method replaces the existing document with the document in the update argument, except for the _id field."

    So you need neither to $unset value, nor to list each field.

    From https://docs.mongodb.com/manual/core/read-isolation-consistency-recency/#cursor-snapshot "MongoDB cursors can return the same document more than once in some situations. ... use a unique index on this field or these fields so that the query will return each document no more than once. Query with hint() to explicitly force the query to use that index."

    0 讨论(0)
  • 2020-12-04 22:27

    A similar approach to that of @ljonas but no need to hardcode document fields:

    db.results.find().forEach( function(result) {
        var value = result.value;
        delete value._id;
        db.results.update({_id: result._id}, value);
        db.results.update({_id: result.id}, {$unset: {value: 1}} )
    } );
    
    0 讨论(0)
  • 2020-12-04 22:30

    All the proposed solutions are far from optimal. The fastest you can do so far is something like:

    var flattenMRCollection=function(dbName,collectionName) {
        var collection=db.getSiblingDB(dbName)[collectionName];
    
        var i=0;
        var bulk=collection.initializeUnorderedBulkOp();
        collection.find({ value: { $exists: true } }).addOption(16).forEach(function(result) {
            print((++i));
            //collection.update({_id: result._id},result.value);
    
            bulk.find({_id: result._id}).replaceOne(result.value);
    
            if(i%1000==0)
            {
                print("Executing bulk...");
                bulk.execute();
                bulk=collection.initializeUnorderedBulkOp();
            }
        });
        bulk.execute();
    };
    

    Then call it: flattenMRCollection("MyDB","MyMRCollection")

    This is WAY faster than doing sequential updates.

    0 讨论(0)
  • AFAIK, by design Mongo's map reduce will spit results out in "value tuples" and I haven't seen anything that will configure that "output format". Maybe the finalize() method can be used.

    You could try running a post-process that will reshape the data using

    results.find({}).forEach( function(result) {
      results.update({_id: result._id}, {count: result.value.count, paths: result.value.paths})
    });
    

    Yep, that looks ugly. I know.

    0 讨论(0)
  • 2020-12-04 22:39

    It's not currently possible, but I would suggest voting for this case: https://jira.mongodb.org/browse/SERVER-2517.

    0 讨论(0)
  • 2020-12-04 22:43

    You can do Dan's code with a collection reference:

        function clean(collection) { 
          collection.find().forEach( function(result) {
          var value = result.value;
          delete value._id;     
          collection.update({_id: result._id}, value);     
          collection.update({_id: result.id}, {$unset: {value: 1}} ) } )};
    
    0 讨论(0)
提交回复
热议问题