问题
I'm running a $geoNear query on my sharded cluster (6 nodes with 3 replica sets each of 2 shardsvr and 1 arbiter). I expect the query to return 1.1m documents. I am recieving only ~130.xxx documents. I am using the Java driver to issue the query and process the data (for now, I'm just counting the documents that get returned). I am using MongoDB 3.2.9 and the latest java driver.
The mongod log shows the following error which is caused by the output document getting larger than 16MB:
2016-10-10T12:00:22.933+0200 W COMMAND [conn22] Too many geoNear results for query { location: { $nearSphere: { type: "Point", coordinates: [ 10.xxxx, 52.xxxxx] }, $maxDistance: 3900.0 } }, truncating output.
2016-10-10T12:00:22.951+0200 I COMMAND [conn22] command mydb.data command: geoNear { geoNear: "data", near: { type: "Point", coordinates: [ 10.xxxx, 52.xxxxx ] },
num: 50000000, maxDistance: 3900.0, query: {}, spherical: true, distanceMultiplier: 1.0, includeLocs: true } keyUpdates:0 writeConflicts:0 numYields:890 reslen:16777310
locks:{ Global: { acquireCount: { r: 1784 } }, Database: { acquireCount: { r: 892 } }, Collection: { acquireCount: { r: 892 } } } protocol:op_query 589ms
2016-10-10T12:00:23.183+0200 I COMMAND [conn22] getmore mydb.data query: { aggregate: "data", pipeline: [ { $geoNear: { near: { type: "Point", coordinates: [ 10.xxxx, 52.xxxxx ] },
distanceField: "dist.calculated", limit: 50000000, maxDistance: 3900.0, query: {}, spherical: true, distanceMultiplier: 1.0, includeLocs: "dist.location" } }, { $project: { _id: false,
dist: { calculated: true } } } ], fromRouter: true, cursor: { batchSize: 0 } } cursorid:170255616227 ntoreturn:0 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:43558
reslen:1568108 locks:{ Global: { acquireCount: { r: 1786 } }, Database: { acquireCount: { r: 893 } }, Collection: { acquireCount: { r: 893 } } } 820ms
The Query:
db.data.aggregate([
{
$geoNear:{
near:{
type:"Point",
coordinates:[
10.xxxx,
52.xxxxx
]
},
distanceField:"dist.calculated",
maxDistance:3900,
num:50000000,
includeLocs:"dist.location",
spherical:true
}
}
])
Note that I issued the query with and without the parameter num
, both fail with the error shown above.
I expected the query to return chunks of the database once the document size limit (16 MB) gets exceeded. What am I missing? How can I retrieve all the data?
Edit: The query also fails with the same error in the mongod logs when I add a group stage:
db.data.aggregate([
{
$geoNear:{
near:{
type:"Point",
coordinates:[
10.xxxx,
52.xxxxxx
]
},
distanceField:"dist.calculated",
maxDistance:3900,
includeLocs:"dist.location",
num:2000000,
spherical:true
}
},
{
$group:{
_id:"$root_document"
}
}
])
回答1:
MongoDB Staff member Lungang Fang has answered to my enquiry on the MongoDB user group in the meantime. Below is his answer:
Currently, the “geoNear” aggregation stage is limited to return results that are within the 16MB BSON size limit. This is related to an issue with earlier version of MongoDB (which is described in https://jira.mongodb.org/browse/SERVER-13486). Your query hit this issue because “geoNear” returns a single document (contains an array of result documents) and the “allowDiskUse” aggregation pipeline option unfortunately does not help in this case.
There are two options that could be considered:
If you don’t need all the results, you could limit the “geoNear” aggregation result size using num, limit, or maxDistance options If you require all of the results, you can use the find() operator which is not limited to the BSON maximum size since it returns a cursor. Below is a test I done on MongoDB 3.2.10 For your information.
Create “2dsphere” for designated collection:
db.coll.createIndex({location: '2dsphere'})
Create and insert several big documents:var padding = ''; for (var j = 0; j < 15; j++) { for (var i = 1024*128; i > 0; --i) { var padding = padding + '12345678'; } }
db.coll.insert({location:{type:"Point", coordinates:[-73.861, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.862, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.863, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.864, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.865, 40.73]}, padding:padding}) db.coll.insert({location:{type:"Point", coordinates:[-73.866, 40.73]}, padding:padding}) Query using “geoNear” and server log shows “Too many geoNear results …, truncating output” db.coll.aggregate( [ { $geoNear:{ near:{type:"Point", coordinates:[-73.86, 40.73]}, distanceField:"dist.calculated", maxDistance:150000000, spherical:true } }, {$project: {location:1}} ] ) Query using “find” and all expected documents are returned // This and following "var" are necessary to avoid the screen being flushed by padding string. var cursor = db.coll.find ( { location: { $near: { $geometry:{type:"Point", coordinates:[-73.86, 40.73]}, maxDistance:150000, } } } ) // It is necessary to iterate through the cursor. Otherwise, the query is not actually executed. var x = cursor.next() x._id var x = cursor.next() x._id ...
Regards, Lungang
来源:https://stackoverflow.com/questions/39956171/mongodb-error-too-many-results-for-query-truncating-output-with-geonear