MongoDB: Combine data from multiple collections into one..how?

后端 未结 11 1149
余生分开走
余生分开走 2020-11-22 06:44

How can I (in MongoDB) combine data from multiple collections into one collection?

Can I use map-reduce and if so then how?

I would greatly appreciate some

相关标签:
11条回答
  • 2020-11-22 07:23

    Yes you can: Take this utility function that I have written today:

    function shangMergeCol() {
      tcol= db.getCollection(arguments[0]);
      for (var i=1; i<arguments.length; i++){
        scol= db.getCollection(arguments[i]);
        scol.find().forEach(
            function (d) {
                tcol.insert(d);
            }
        )
      }
    }
    

    You can pass to this function any number of collections, the first one is going to be the target one. All the rest collections are sources to be transferred to the target one.

    0 讨论(0)
  • 2020-11-22 07:23

    Code snippet. Courtesy-Multiple posts on stack overflow including this one.

     db.cust.drop();
     db.zip.drop();
     db.cust.insert({cust_id:1, zip_id: 101});
     db.cust.insert({cust_id:2, zip_id: 101});
     db.cust.insert({cust_id:3, zip_id: 101});
     db.cust.insert({cust_id:4, zip_id: 102});
     db.cust.insert({cust_id:5, zip_id: 102});
    
     db.zip.insert({zip_id:101, zip_cd:'AAA'});
     db.zip.insert({zip_id:102, zip_cd:'BBB'});
     db.zip.insert({zip_id:103, zip_cd:'CCC'});
    
    mapCust = function() {
        var values = {
            cust_id: this.cust_id
        };
        emit(this.zip_id, values);
    };
    
    mapZip = function() {
        var values = {
        zip_cd: this.zip_cd
        };
        emit(this.zip_id, values);
    };
    
    reduceCustZip =  function(k, values) {
        var result = {};
        values.forEach(function(value) {
        var field;
            if ("cust_id" in value) {
                if (!("cust_ids" in result)) {
                    result.cust_ids = [];
                }
                result.cust_ids.push(value);
            } else {
        for (field in value) {
            if (value.hasOwnProperty(field) ) {
                    result[field] = value[field];
            }
             };  
           }
          });
           return result;
    };
    
    
    db.cust_zip.drop();
    db.cust.mapReduce(mapCust, reduceCustZip, {"out": {"reduce": "cust_zip"}});
    db.zip.mapReduce(mapZip, reduceCustZip, {"out": {"reduce": "cust_zip"}});
    db.cust_zip.find();
    
    
    mapCZ = function() {
        var that = this;
        if ("cust_ids" in this.value) {
            this.value.cust_ids.forEach(function(value) {
                emit(value.cust_id, {
                    zip_id: that._id,
                    zip_cd: that.value.zip_cd
                });
            });
        }
    };
    
    reduceCZ = function(k, values) {
        var result = {};
        values.forEach(function(value) {
            var field;
            for (field in value) {
                if (value.hasOwnProperty(field)) {
                    result[field] = value[field];
                }
            }
        });
        return result;
    };
    db.cust_zip_joined.drop();
    db.cust_zip.mapReduce(mapCZ, reduceCZ, {"out": "cust_zip_joined"}); 
    db.cust_zip_joined.find().pretty();
    
    
    var flattenMRCollection=function(dbName,collectionName) {
        var collection=db.getSiblingDB(dbName)[collectionName];
    
        var i=0;
        var bulk=collection.initializeUnorderedBulkOp();
        collection.find({ value: { $exists: true } }).addOption(16).forEach(function(result) {
            print((++i));
            //collection.update({_id: result._id},result.value);
    
            bulk.find({_id: result._id}).replaceOne(result.value);
    
            if(i%1000==0)
            {
                print("Executing bulk...");
                bulk.execute();
                bulk=collection.initializeUnorderedBulkOp();
            }
        });
        bulk.execute();
    };
    
    
    flattenMRCollection("mydb","cust_zip_joined");
    db.cust_zip_joined.find().pretty();
    
    0 讨论(0)
  • 2020-11-22 07:25

    Very basic example with $lookup.

    db.getCollection('users').aggregate([
        {
            $lookup: {
                from: "userinfo",
                localField: "userId",
                foreignField: "userId",
                as: "userInfoData"
            }
        },
        {
            $lookup: {
                from: "userrole",
                localField: "userId",
                foreignField: "userId",
                as: "userRoleData"
            }
        },
        { $unwind: { path: "$userInfoData", preserveNullAndEmptyArrays: true }},
        { $unwind: { path: "$userRoleData", preserveNullAndEmptyArrays: true }}
    ])
    

    Here is used

     { $unwind: { path: "$userInfoData", preserveNullAndEmptyArrays: true }}, 
     { $unwind: { path: "$userRoleData", preserveNullAndEmptyArrays: true }}
    

    Instead of

    { $unwind:"$userRoleData"} 
    { $unwind:"$userRoleData"}
    

    Because { $unwind:"$userRoleData"} this will return empty or 0 result if no matching record found with $lookup.

    0 讨论(0)
  • 2020-11-22 07:29

    Although you can't do this real-time, you can run map-reduce multiple times to merge data together by using the "reduce" out option in MongoDB 1.8+ map/reduce (see http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions). You need to have some key in both collections that you can use as an _id.

    For example, let's say you have a users collection and a comments collection and you want to have a new collection that has some user demographic info for each comment.

    Let's say the users collection has the following fields:

    • _id
    • firstName
    • lastName
    • country
    • gender
    • age

    And then the comments collection has the following fields:

    • _id
    • userId
    • comment
    • created

    You would do this map/reduce:

    var mapUsers, mapComments, reduce;
    db.users_comments.remove();
    
    // setup sample data - wouldn't actually use this in production
    db.users.remove();
    db.comments.remove();
    db.users.save({firstName:"Rich",lastName:"S",gender:"M",country:"CA",age:"18"});
    db.users.save({firstName:"Rob",lastName:"M",gender:"M",country:"US",age:"25"});
    db.users.save({firstName:"Sarah",lastName:"T",gender:"F",country:"US",age:"13"});
    var users = db.users.find();
    db.comments.save({userId: users[0]._id, "comment": "Hey, what's up?", created: new ISODate()});
    db.comments.save({userId: users[1]._id, "comment": "Not much", created: new ISODate()});
    db.comments.save({userId: users[0]._id, "comment": "Cool", created: new ISODate()});
    // end sample data setup
    
    mapUsers = function() {
        var values = {
            country: this.country,
            gender: this.gender,
            age: this.age
        };
        emit(this._id, values);
    };
    mapComments = function() {
        var values = {
            commentId: this._id,
            comment: this.comment,
            created: this.created
        };
        emit(this.userId, values);
    };
    reduce = function(k, values) {
        var result = {}, commentFields = {
            "commentId": '', 
            "comment": '',
            "created": ''
        };
        values.forEach(function(value) {
            var field;
            if ("comment" in value) {
                if (!("comments" in result)) {
                    result.comments = [];
                }
                result.comments.push(value);
            } else if ("comments" in value) {
                if (!("comments" in result)) {
                    result.comments = [];
                }
                result.comments.push.apply(result.comments, value.comments);
            }
            for (field in value) {
                if (value.hasOwnProperty(field) && !(field in commentFields)) {
                    result[field] = value[field];
                }
            }
        });
        return result;
    };
    db.users.mapReduce(mapUsers, reduce, {"out": {"reduce": "users_comments"}});
    db.comments.mapReduce(mapComments, reduce, {"out": {"reduce": "users_comments"}});
    db.users_comments.find().pretty(); // see the resulting collection
    

    At this point, you will have a new collection called users_comments that contains the merged data and you can now use that. These reduced collections all have _id which is the key you were emitting in your map functions and then all of the values are a sub-object inside the value key - the values aren't at the top level of these reduced documents.

    This is a somewhat simple example. You can repeat this with more collections as much as you want to keep building up the reduced collection. You could also do summaries and aggregations of data in the process. Likely you would define more than one reduce function as the logic for aggregating and preserving existing fields gets more complex.

    You'll also note that there is now one document for each user with all of that user's comments in an array. If we were merging data that has a one-to-one relationship rather than one-to-many, it would be flat and you could simply use a reduce function like this:

    reduce = function(k, values) {
        var result = {};
        values.forEach(function(value) {
            var field;
            for (field in value) {
                if (value.hasOwnProperty(field)) {
                    result[field] = value[field];
                }
            }
        });
        return result;
    };
    

    If you want to flatten the users_comments collection so it's one document per comment, additionally run this:

    var map, reduce;
    map = function() {
        var debug = function(value) {
            var field;
            for (field in value) {
                print(field + ": " + value[field]);
            }
        };
        debug(this);
        var that = this;
        if ("comments" in this.value) {
            this.value.comments.forEach(function(value) {
                emit(value.commentId, {
                    userId: that._id,
                    country: that.value.country,
                    age: that.value.age,
                    comment: value.comment,
                    created: value.created,
                });
            });
        }
    };
    reduce = function(k, values) {
        var result = {};
        values.forEach(function(value) {
            var field;
            for (field in value) {
                if (value.hasOwnProperty(field)) {
                    result[field] = value[field];
                }
            }
        });
        return result;
    };
    db.users_comments.mapReduce(map, reduce, {"out": "comments_with_demographics"});
    

    This technique should definitely not be performed on the fly. It's suited for a cron job or something like that which updates the merged data periodically. You'll probably want to run ensureIndex on the new collection to make sure queries you perform against it run quickly (keep in mind that your data is still inside a value key, so if you were to index comments_with_demographics on the comment created time, it would be db.comments_with_demographics.ensureIndex({"value.created": 1});

    0 讨论(0)
  • Doing unions in MongoDB in a 'SQL UNION' fashion is possible using aggregations along with lookups, in a single query. Here is an example I have tested that works with MongoDB 4.0:

    // Create employees data for testing the union.
    db.getCollection('employees').insert({ name: "John", type: "employee", department: "sales" });
    db.getCollection('employees').insert({ name: "Martha", type: "employee", department: "accounting" });
    db.getCollection('employees').insert({ name: "Amy", type: "employee", department: "warehouse" });
    db.getCollection('employees').insert({ name: "Mike", type: "employee", department: "warehouse"  });
    
    // Create freelancers data for testing the union.
    db.getCollection('freelancers').insert({ name: "Stephany", type: "freelancer", department: "accounting" });
    db.getCollection('freelancers').insert({ name: "Martin", type: "freelancer", department: "sales" });
    db.getCollection('freelancers').insert({ name: "Doug", type: "freelancer", department: "warehouse"  });
    db.getCollection('freelancers').insert({ name: "Brenda", type: "freelancer", department: "sales"  });
    
    // Here we do a union of the employees and freelancers using a single aggregation query.
    db.getCollection('freelancers').aggregate( // 1. Use any collection containing at least one document.
      [
        { $limit: 1 }, // 2. Keep only one document of the collection.
        { $project: { _id: '$$REMOVE' } }, // 3. Remove everything from the document.
    
        // 4. Lookup collections to union together.
        { $lookup: { from: 'employees', pipeline: [{ $match: { department: 'sales' } }], as: 'employees' } },
        { $lookup: { from: 'freelancers', pipeline: [{ $match: { department: 'sales' } }], as: 'freelancers' } },
    
        // 5. Union the collections together with a projection.
        { $project: { union: { $concatArrays: ["$employees", "$freelancers"] } } },
    
        // 6. Unwind and replace root so you end up with a result set.
        { $unwind: '$union' },
        { $replaceRoot: { newRoot: '$union' } }
      ]);
    

    Here is the explanation of how it works:

    1. Instantiate an aggregate out of any collection of your database that has at least one document in it. If you can't guarantee any collection of your database will not be empty, you can workaround this issue by creating in your database some sort of 'dummy' collection containing a single empty document in it that will be there specifically for doing union queries.

    2. Make the first stage of your pipeline to be { $limit: 1 }. This will strip all the documents of the collection except the first one.

    3. Strip all the fields of the remaining document by using a $project stage:

      { $project: { _id: '$$REMOVE' } }
      
    4. Your aggregate now contains a single, empty document. It's time to add lookups for each collection you want to union together. You may use the pipeline field to do some specific filtering, or leave localField and foreignField as null to match the whole collection.

      { $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
      { $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
      { $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } }
      
    5. You now have an aggregate containing a single document that contains 3 arrays like this:

      {
          Collection1: [...],
          Collection2: [...],
          Collection3: [...]
      }
      

      You can then merge them together into a single array using a $project stage along with the $concatArrays aggregation operator:

      {
        "$project" :
        {
          "Union" : { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
        }
      }
      
    6. You now have an aggregate containing a single document, into which is located an array that contains your union of collections. What remains to be done is to add an $unwind and a $replaceRoot stage to split your array into separate documents:

      { $unwind: "$Union" },
      { $replaceRoot: { newRoot: "$Union" } }
      
    7. Voilà. You now have a result set containing the collections you wanted to union together. You can then add more stages to filter it further, sort it, apply skip() and limit(). Pretty much anything you want.

    0 讨论(0)
  • 2020-11-22 07:34

    If there is no bulk insert into mongodb, we loop all objects in the small_collection and insert them one by one into the big_collection:

    db.small_collection.find().forEach(function(obj){ 
       db.big_collection.insert(obj)
    });
    
    0 讨论(0)
提交回复
热议问题