问题
I have a large mongoDB collection. I want to export this collection to CSV so I can then import it in a statistics package to do data analysis.
The collection has about 15 GB of documents in it. I would like to split the collection into ~100 equally sized CSV files. Is there any way to achieve this using mongoexport? I could also query the whole collection in pymongo, split it and write to csv files manually, but I guess this would be slower and would require more coding.
Thank you for input.
回答1:
You can do it using --skip
& --limit
options.
For example, if you that your collection holds 1,000 document you can do it using a script loop (pseudo code):
loops = 100
count = db.collection.count()
batch_size = count / loops
for (i = 0; i < loops; i++) {
mongoexport --skip (batch_size * i) --limit batch_size --out export${i}.json ...
}
Taking into account that your documents are roughly equal in size.
Note however, that large skips are slow.
Lower bound iterations will be faster than upper bound iterations.
回答2:
Better version of above loop that does it all in parallel because you're an impatient sonnofabitch like I am:
presume we have 385892079 records, divide that by 100.
let bs=3858920
for i in {1..100}
do
let bsi=${bs}*$i
mongoexport --db dbnamehere --collection collectionNamehere --port 3303\
--fields="f1,f2,f3" \
--out /opt/path/to/output/dir/dump.${i}.json -v \
--skip ${bsi} --limit ${bs}
done
回答3:
#total=335584
limit=20974;
skip=0;
for i in {1..16}; do mongoexport --host localhost --db tweets --collection mycollection --type=csv --fields tweet_id,user_name,user_id,text --out master_new/mongo_rec_${i}.csv -v --skip ${skip} --limit ${limit} --quiet; let skip=$((skip+limit)); done
来源:https://stackoverflow.com/questions/29081431/mongoexport-to-multiple-csv-files