Is there an explain function for the Aggregation framework in MongoDB? I can\'t see it in the documentation.
If not is there some other way to check, how a query p
Starting with MongoDB version 3.0, simply changing the order from
collection.aggregate(...).explain()
to
collection.explain().aggregate(...)
will give you the desired results (documentation here).
For older versions >= 2.6, you will need to use the explain option for aggregation pipeline operations
explain:true
db.collection.aggregate([
{ $project : { "Tags._id" : 1 }},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{ $group: {
_id : "$_id",
count: { $sum:1 }
}},
{$sort: {"count":-1}}
],
{
explain:true
}
)
An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match
, $sort
, $geonear
at the beginning of a pipeline) as well as subsequent $lookup
and $graphLookup
stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project
, $unwind
, and $group
) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse
option is set).
In general, you can optimize aggregation pipelines by:
$match
stage to restrict processing to relevant documents.$match
/ $sort
stages are supported by an efficient index.$match
, $limit
, and $skip
.There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.
As at MongoDB 3.4, the Aggregation Framework explain
option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats
mode for a find()
query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain()
query with executionStats or allPlansExecution verbosity.
There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:
Starting with version 2.6.x mongodb allows users to do explain with aggregation framework.
All you need to do is to add explain : true
db.records.aggregate(
[ ...your pipeline...],
{ explain: true }
)
Thanks to Rafa, I know that it was possible to do even in 2.4, but only through runCommand()
. But now you can use aggregate as well.
The aggregation framework is a set of analytics tools within MongoDB
that allows us to run various types of reports or analysis on documents in one or more collections. Based on the idea of a pipeline. We take input from a MongoDB
collection and pass the documents from that collection through one or more stages, each of which performs a different operation on it's inputs. Each stage takes as input whatever the stage before it produced as output. And the inputs and outputs for all stages are a stream of documents. Each stage has a specific job that it does. It's expecting a specific form of document and produces a specific output, which is itself a stream of documents. At the end of the pipeline, we get access to the output.
An individual stage is a data processing unit. Each stage takes as input a stream of documents one at a time, processes each document one at a time and produces the output stream of documents. Again, one at a time. Each stage provide a set of knobs or tunables that we can control to parameterize the stage to perform whatever task we're interested in doing. So a stage performs a generic task - a general purpose task of some kind and parameterize the stage for the particular set of documents that we're working with. And exactly what we would like that stage to do with those documents. These tunables typically take the form of operators that we can supply that will modify fields, perform arithmetic operations, reshape documents or do some sort of accumulation task as well as a veriety of other things. Often times, it the case that we'll want to include the same type of stage multiple times within a single pipeline.
e.g. We may wish to perform an initial filter so that we don't have to pass the entire collection into our pipeline. But, then later on, following some additional processing, want to filter once again using a different set of criteria. So, to recap, pipeline works with a MongoDB
collection. They're composed of stages, each of which does a different data processing task on it's input and produces documents as output to be passed to the next stage. And finally at the end of the pipeline output is produced that we can then do something within our application. In many cases, it's necessary to include the same type of stage, multiple times within an individual pipeline.