Microsoft Cosmosdb for Mongodb: merge unsharded collection into sharded ones

牧云@^-^@ 提交于 2020-01-25 07:53:06

问题


I have 2 collections of similar documents(i.e. same object, different values). One collection(X) is unsharded in database A, another collection(Y) is sharded and inside database B. When I try copy collection X into database B, I got error saying that "Shared throughput collection should have a partition key". I also tried copying data using foreach insert, but it takes too long time.

So my question is, how can I append the data from collection X to collection Y in efficient way?

Mongodb version on CosmosDB is 3.4.6


回答1:


You may perform aggregation and add as last stage $merge operator.

| $merge                                | $out                                       | 
| Can output to a sharded collection.   | Cannot output to a sharded collection.     | 
| Input collection can also be sharded. | Input collection, however, can be sharded. | 

https://docs.mongodb.com/manual/reference/operator/aggregation/merge/#comparison-with-out




回答2:


So my question is, how can I append the data from collection X to collection Y in efficient way?

The server tools mongodump and mongorestore can be used. You can export the source collection data into BSON dump files and import into the target collection. These processess are very quick, because the data in the database is already in BSON format.

Data can be exported from a non-sharded collection to a sharded collection using these tools. In this case, it is required that the source collection has the shard-key field (or fields) with values. Note the indexes from the source collection are also exported and imported (using these tools).

Here is an example of the scenario in question:

mongodump --db=srcedb --collection=srcecoll --out="C:\mongo\dumps"

This creates a dump directory with the database name. There will be "srcecoll.bson" file in it and it is used for importing.

mongorestore --port 26xxxx --db=trgtdb --collection=trgtcoll --dir="C:\mongo\dumps\srcecoll.bson"

The host/port connects to the mongos of the sharded cluster. Note the bson file name need to be specified in the --dir option.

The import adds data and indexes into the existing sharded collection. The process only inserts data; the existing documents cannot be updated. If the _id value from the source collection already exists in the target collection, the process will not overwrite the documents (and those documents will not be imported, and it is not an error).

There are some useful options for mongorestore like: --noIndexRestore and --dryRun.




回答3:


Because, the MongoDb version in CosmosDB currently 3.4.6, it doesn't support $merge and a lot of other commands such as colleciton.copyTo etc. Using Studio 3T's import feature didn't help as well.

The solution I use, is to download the target collection on my local mongodb, clean it then write java code that will read my clean data from local db and insertMany(or bulkwrite) it to the target collection. This way, the data will be appended to the target collection. The speed I measured was 2 hours for 1m document count(~750MB), of course, this numbers might vary depending on various factors, i.e. network, document size etc.



来源:https://stackoverflow.com/questions/59614419/microsoft-cosmosdb-for-mongodb-merge-unsharded-collection-into-sharded-ones

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!