Can you use mongodump to dump the latest \"x\" documents from a collection? For example, in the mongo shell you can execute:
db.stats.find().sort({$natural:-1}).
_id
-based approaches may not work if you use a custom _id
for your collection (such as returned by a 3rd party API). In that case, you should depend on a createdAt
or equivalent field:
COL="collectionName"
HOW_MANY=10000
DATE_CUTOFF=$(mongo <host, user, pass...> dbname --quiet \
--eval "db.$COL.find({}, { createdAt: 1 }).sort({ createdAt: -1 }).skip($HOW_MANY).limit(1)"\
| grep -E -o '(ISODate\(.*?\))')
echo "Copying $HOW_MANY items after $DATE_CUTOFF..."
mongodump <host, user, pass...> -d dbname -c ${COL}\
-q "{ createdAt: { \$gte: $DATE_CUTOFF} }" --gzip
I was playing with a similar requirement (using mongodump) where I wanted to do sequential backup and restore. I would take dump from last stored timestamp. I couldn't get through --query '{ TIMESTAMP : { $gte : $stime, $lt : $etime } }'
Some points to note: 1) use single quote instead of double 2) do not escape $ or anything 3) replacing $stime/$etime with real numbers will make the query work 4) problem I had was with getting $stime/$etime resolved before mongodump executes itself under -x it showed as + eval mongodump --query '{TIMESTAMP:{\$gte:$utc_stime,\$lt:$utc_etime}}' ++ mongodump --query '{TIMESTAMP:$gte:1366700243}' '{TIMESTAMP:$lt:1366700253}'
Hell, the problem was evident. query gets converted into two conditionals.
The solution is tricky and I got it after repeated trials.... escape { and } ie use { ..} . This fixes the problem.
Building off of Mic92's answer, to get the most recent 1000 items from a collection:
Find the _id
of the 1000th most recent item:
db.collection.find('', {'_id':1}).sort({_id:-1}).skip(1000).limit(1)
It will be something like 50ad7bce1a3e927d690385ec
.
Then pass this _id in a query to mongodump:
$ mongodump -d 'your_database' -c 'your_collection' -q '{"_id": {"$gt": {"$oid": "50ad7bce1a3e927d690385ec"}}}'
mongodump supports the --query operator. If you can specify your query as a json query, you should be able to do just that.
If not, then your trick of running a query to dump the records into a temporary collection and then dumping that will work just fine. In this case, you could automate the dump using a shell script that calls a mongo with a javascript command to do what you want and then calling mongodump.
mongodump
does not fully expose the cursor interfaces.
But you can work around it, using the --query
parameter.
First get the total number of documents of the collection
db.collection.count()
Let's say there are 10000 documents and you want the last 1000. To do so get the id of first document you want to dump.
db.collection.find().sort({_id:1}).skip(10000 - 1000).limit(1)
In this example the id was "50ad7bce1a3e927d690385ec"
.
Now you can feed mongodump
with this information, to dump all documents a with higher or equal id.
$ mongodump -d 'your_database' -c 'your_collection' -q '{_id: {$gte: ObjectId("50ad7bce1a3e927d690385ec")}}'
UPDATE
The new parameters --limit
and --skip
were added to mongoexport
will be probably available in the next version of the tool: https://github.com/mongodb/mongo/pull/307
try this:
NUM=10000
doc=selected_doc
taskid=$(mongo 127.0.0.1/selected_db -u username -p password --eval "db.${doc}.find({}, {_id: 1}).sort({_id: -1}).skip($NUM).limit(1)" | grep -E -o '"[0-9a-f]+"')
mongodump --collection $doc --db selected_db --host 127.0.0.1 -u username -p password -q "{_id: {\$gte: $taskid}}" --out ${doc}.dump