问题
I export 2 collections using mongoexport:
mongoexport -h -u -p --db dev -c parent -q '{_id: ObjectId("5ae1b4b93b131f57b33b8d11")}' -o parent.json
and
mongoexport -h -u -p --db dev -c children -q '{parentId: ObjectId("5ae1b4b93b131f57b33b8d11")}' -o children.json
From the first one I got one record (selected by ID) and from the second I got many record (selected by parentId which is ID from the first one.
How can I use mongoimport to import those with new IDs but keeping the relation -> parent.id = child.parentId
?
回答1:
TLDR; Skip to the parts to solve this, or actually read if you want to clear up the misconception.
Background
Let's clear up the basic misconception here in that MongoDB "itself" has no concept of relations, whatsoever. A "relation" as you are referring to it merely a value recorded in the property in one collection which "refers" to another value present in a different collection. Nothing about these "foreign keys" enforces any kind on constraint in MongoDB which defines that the data is "related", and they are just "values".
Once you accept that fact then it should be self apparent why any tool such as mongoexport or even basic query operations like find() really have no idea that just because you put a value in somewhere that it is "meant to" go and get that data from somewhere else.
Once upon a time there was ( and it sill is to a lesser extent ) a concept called DBRef, which stored not just a "value" but also some detail as to where that value resided in a "referential" term. This could be collection or database and collection. The concept however remains relatively short lived and not useful in modern context. Even with this "metadata" stored, the database itself covered no concept of "relation" to the data. Retrieval still remained a "client concept" where some drivers had the ability to see a DBRef and then "resolve" it by issuing another query to the server to retrieve the "related" data.
Naturally this is "full of holes" and the concept was abandoned in favor of more modern concepts, notably $lookup.
Modern Construction
What this all really comes down to is that as far as MongoDB itself is concerned a "relation" is actually "client concept", in the sense that an "external entity" actually makes the decision that "this" is related to "that" via what data you define as "equal references".
There are no "database rules" that enforce this, so in contrast to various traditional RDBMS solutions, the "object store" nature of MongoDB essentially says "...that's not my job, delegate to someone else" and that typically means you define that within the "client" logic of what is used to access the database.
However, there are "some tools" that allow the "server" to actually act on that logic. These essentially revolve around $lookup which is essentially he "modern method" to perform "joins" on the server when working with MongoDB.
So if you have "related data" you want to "export", then you basically have a couple of options:
Create a "view" using $lookup
MongoDB introduced "views" with the 3.2 release. These are essentially "aggregation pipelines" which "masquerade" as a collection. For all purposes of normal operations like
.find()
or evenmongoexport
this pipeline looks just like a collection and can be accessed as such.db.createView("related_view", "parent", [ { "$lookup": { "from": "children", "localField": "_id", "foreignField": "parentId", "as": "children" }} ])
With that "view" in place you can simply call
mongoexport
using the name defined for the "view":mongoexport -c related_view -o output.json
Just as $lookup will do, each "parent" item will now contain an array with the "related" content from "children" by the foreign key.
As $lookup produces the output as a BSON Document, the same constraints apply as with all MongoDB in that the resulting "join" cannot exceed 16MB within any document. So if the array causes the parent document to grow beyond this limit then using output as an "array" embedded within the document is not an option.
For this case you would generally use $unwind in order to "de-normalize" the output content just as it would appear with a typical SQL join. In this case the "parent" document would be copied for each "related" member, and output documents match those from the related children but with all the parent information and the "children" as a singular embedded property.
This just means adding $unwind to such a "view":
db.createView("related_view", "parent", [ { "$lookup": { "from": "children", "localField": "_id", "foreignField": "parentId", "as": "children" }}, { "$unwind": "$children" } ])
Since we are just outputting essentially one document for each "related child" then it's unlikely the BSON Limit is breached. With very large documents for both parent and child it is still a possibility, though rare. For that case there would be different handling as we can mention later.
Use $lookup directly in script
If you don't have a MongoDB version supporting "views" but you still have $lookup and no restriction of the BSON Limit, you could still essentially "script" the invocation of the aggregation pipeline with the
mongo
shell and output as JSON.The process is similar yet instead of using a "view" and
mongoexport
, we manually wrap a few commands which can be invoked into the shell from the command line:mongo --quiet --eval ' db.parent.aggregate([ { "$lookup": { "from": "children", "localField": "_id", "foreignField": "parentId", "as": "children" }} ]).forEach(p => printjson(p))'
And again much the same as the process before you can optionally $unwind as well in the pipeline if that is what you are after
Script the "join"
If you are running on a MongoDB instance without $lookup support ( and you should not be, since lower than 3.0 has no more official support ) or indeed you actually have a scenario where the "join" would create data per "parent" document which exceeded the BSON limit, then the other option is to "script" the entire join process by executing queries to obtain the "related" data and output it.
mongo --quiet --eval ' db.parent.find().forEach(p => printjson( Object.assign(p, { children: db.children.find({ parentId: p._id }).toArray() }) ) )'
Or even in the "unwound" or "de-normalized" form:
mongo --quiet --eval ' db.parent.find().forEach(p => db.children.find({ parentId: p._id }).forEach(child => printjson(Object.assign(p,{ child })) ) )'
Summary
Bottom line is "MongoDB itself" does not know about "relations", and it really is up to you to provide that detail. Be it in the form of a "view" you can access or by other means of defining the "code" necessary to explicitly state the "terms" of this "relation", because as far as the database itself is concerned this simply does not exist in any other form.
Also just to address a point in comment, if your intention on "export" is just to create a "new collection" then either simply create the "view" or use the $out aggregation operator:
db.parent.aggregate([
{ "$lookup": {
"from": "children",
"localField": "_id",
"foreignField": "parentId",
"as": "children"
}},
{ "$out": "related_collection" }
])
And if you wanted to "alter the parent" with "embedded" data, then loop and use bulkWrite():
var batch = [];
db.parent.aggregate([
{ "$lookup": {
"from": "children",
"localField": "_id",
"foreignField": "parentId",
"as": "children"
}}
]).forEach(p => {
batch.push({
"updateOne": {
"filter": { "_id": p._id },
"update": {
"$push": { "children": { "$each": p.children } }
}
}
});
if (batch.length > 1000) {
db.parent.bulkWrite(batch);
batch = [];
})
});
if (batch.length > 0) {
db.parent.bulkWrite(batch);
batch = [];
}
There simply is no need to "export" merely to create a new collection or alter an existing one. You would do this of course when you want to actually keep the collection as "embedded" data and not need the overhead of $lookup on every request. But the decision of whether to "embed or reference" is a whole other story.
来源:https://stackoverflow.com/questions/50453111/how-to-mongoexport-and-mongoimport-collections-with-new-ids-but-keeping-the-rela