Mongoexport to multiple csv files

佐手、 提交于 2020-06-12 09:12:00

问题


I have a large mongoDB collection. I want to export this collection to CSV so I can then import it in a statistics package to do data analysis.

The collection has about 15 GB of documents in it. I would like to split the collection into ~100 equally sized CSV files. Is there any way to achieve this using mongoexport? I could also query the whole collection in pymongo, split it and write to csv files manually, but I guess this would be slower and would require more coding.

Thank you for input.


回答1:


You can do it using --skip & --limit options.

For example, if you that your collection holds 1,000 document you can do it using a script loop (pseudo code):

loops = 100
count = db.collection.count()
batch_size = count / loops

for (i = 0; i < loops; i++) {
    mongoexport --skip (batch_size * i) --limit batch_size --out export${i}.json ...
} 

Taking into account that your documents are roughly equal in size.

Note however, that large skips are slow.

Lower bound iterations will be faster than upper bound iterations.




回答2:


Better version of above loop that does it all in parallel because you're an impatient sonnofabitch like I am:

presume we have 385892079 records, divide that by 100.

let bs=3858920 for i in {1..100} do let bsi=${bs}*$i mongoexport --db dbnamehere --collection collectionNamehere --port 3303\ --fields="f1,f2,f3" \ --out /opt/path/to/output/dir/dump.${i}.json -v \ --skip ${bsi} --limit ${bs} done




回答3:


#total=335584
limit=20974;
skip=0;
for i in {1..16}; do mongoexport --host localhost --db tweets --collection mycollection --type=csv --fields tweet_id,user_name,user_id,text --out master_new/mongo_rec_${i}.csv -v --skip ${skip} --limit ${limit} --quiet; let skip=$((skip+limit)); done


来源:https://stackoverflow.com/questions/29081431/mongoexport-to-multiple-csv-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!