How can I (in MongoDB) combine data from multiple collections into one collection?
Can I use map-reduce and if so then how?
I would greatly appreciate some
Starting Mongo 4.4
, we can achieve this join within an aggregation pipeline by coupling the new $unionWith aggregation stage with $group
's new $accumulator operator:
// > db.users.find()
// [{ user: 1, name: "x" }, { user: 2, name: "y" }]
// > db.books.find()
// [{ user: 1, book: "a" }, { user: 1, book: "b" }, { user: 2, book: "c" }]
// > db.movies.find()
// [{ user: 1, movie: "g" }, { user: 2, movie: "h" }, { user: 2, movie: "i" }]
db.users.aggregate([
{ $unionWith: "books" },
{ $unionWith: "movies" },
{ $group: {
_id: "$user",
user: {
$accumulator: {
accumulateArgs: ["$name", "$book", "$movie"],
init: function() { return { books: [], movies: [] } },
accumulate: function(user, name, book, movie) {
if (name) user.name = name;
if (book) user.books.push(book);
if (movie) user.movies.push(movie);
return user;
},
merge: function(userV1, userV2) {
if (userV2.name) userV1.name = userV2.name;
userV1.books.concat(userV2.books);
userV1.movies.concat(userV2.movies);
return userV1;
},
lang: "js"
}
}
}}
])
// { _id: 1, user: { books: ["a", "b"], movies: ["g"], name: "x" } }
// { _id: 2, user: { books: ["c"], movies: ["h", "i"], name: "y" } }
$unionWith combines records from the given collection within documents already in the aggregation pipeline. After the 2 union stages, we thus have all users, books and movies records within the pipeline.
We then $group
records by $user
and accumulate items using the $accumulator operator allowing custom accumulations of documents as they get grouped:
accumulateArgs
.init
defines the state that will be accumulated as we group elements.accumulate
function allows performing a custom action with a record being grouped in order to build the accumulated state. For instance, if the item being grouped has the book
field defined, then we update the books
part of the state.merge
is used to merge two internal states. It's only used for aggregations running on sharded clusters or when the operation exceeds memory limits.