Using the code:
all_reviews = db_handle.find().sort('reviewDate', pymongo.ASCENDING)
print all_reviews.count()
print all_reviews[0]
print all_reviews[2000000]
The count prints 2043484
, and it prints all_reviews[0]
.
However when printing all_reviews[2000000]
, I get the error:
pymongo.errors.OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33554495 bytes exceeds internal limit of 33554432 bytes
How do I handle this?
You're running into the 32MB limit on an in-memory sort:
https://docs.mongodb.com/manual/reference/limits/#Sort-Operations
Add an index to the sort field. That allows MongoDB to stream documents to you in sorted order, rather than attempting to load them all into memory on the server and sort them in memory before sending them to the client.
As said by kumar_harsh
in the comments section, i would like to add another point.
You can view the current buffer usage using the below command over the admin
database:
> use admin
switched to db admin
> db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 }
It has a default value of 32 MB(33554432 bytes).In this case you're running short of buffer data so you can increase buffer limit with your own defined optimal value, example 50 MB as below:
> db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes:50151432})
{ "was" : 33554432, "ok" : 1 }
We can also set this limit permanently by the below parameter in the mongodb config file:
setParameter=internalQueryExecMaxBlockingSortBytes=309715200
Hope this helps !!!
Note
:This commands supports only after version 3.0 +
solved with indexing
db_handle.ensure_index([("reviewDate", pymongo.ASCENDING)])
If you want to avoid creating an index (e.g. you just want a quick-and-dirty check to explore the data), you can use aggregation with disk usage:
all_reviews = db_handle.aggregate([{$sort: {'reviewDate': 1}}], {allowDiskUse: true})
(Not sure how to do this in pymongo, though).
JavaScript API syntax for the index:
db_handle.ensureIndex({executedDate: 1})
In my case, it was necessary to fix nessary indexes in code and recreate them:
rake db:mongoid:create_indexes RAILS_ENV=production
As the memory overflow does not occur when there is a needed index of field.
PS Before this I had to disable the errors when creating long indexes:
# mongo
MongoDB shell version: 2.6.12
connecting to: test
> db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )
Also may be needed reIndex
:
# mongo
MongoDB shell version: 2.6.12
connecting to: test
> use your_db
switched to db your_db
> db.getCollectionNames().forEach( function(collection){ db[collection].reIndex() } )
来源:https://stackoverflow.com/questions/27023622/overflow-sort-stage-buffered-data-usage-exceeds-internal-limit