How do I use aggregation operators in a $match in MongoDB (for example $year or $dayOfMonth)?

前端 未结 3 1154
生来不讨喜
生来不讨喜 2021-02-12 16:22

I have a collection full of documents with a created_date attribute. I\'d like to send these documents through an aggregation pipeline to do some work on them. Ideally I would

相关标签:
3条回答
  • 2021-02-12 16:29

    Try this;

    db.createCollection("so");
    db.so.remove();
    db.so.insert([
    {
        post_body: 'This is the body of test post 1',
        created_date: ISODate('2012-09-29T05:23:41Z'),
        comments: 48
    },
    {
        post_body: 'This is the body of test post 2',
        created_date: ISODate('2012-09-24T12:34:13Z'),
        comments: 10
    },
    {
        post_body: 'This is the body of test post 3',
        created_date: ISODate('2012-08-16T12:34:13Z'),
        comments: 10
    }
    ]);
    //db.so.find();
    
    db.so.ensureIndex({"created_date":1});
    db.runCommand({
        aggregate:"so",
        pipeline:[
            {
                $match: { // filter only those posts in september
                    created_date: { $gte: ISODate('2012-09-01'), $lt: ISODate('2012-10-01') }
                }
            },
            {
                $group: {
                    _id: null, // no shared key
                    comments: { $sum: "$comments" } // total comments for all the posts in the pipeline
                }
            },
    ]
    //,explain:true
    });
    

    Result is;

    { "result" : [ { "_id" : null, "comments" : 58 } ], "ok" : 1 }
    

    So you could also modify your previous example to do this, although I'm not sure why you'd want to, unless you plan on doing something else with month and year in the pipeline;

    {
        aggregate: 'posts',
        pipeline: [
         {$match: { created_date: { $gte: ISODate('2012-09-01'), $lt: ISODate('2012-10-01') } } },
         {$project:
              {
                   month : {$month:'$created_date'},
                   year : {$year:'$created_date'}
              }
         },
         {$match:
              {
                   month:9,
                   year: 2012
               }
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
        ]
     }
    
    0 讨论(0)
  • 2021-02-12 16:42

    As you already found, you cannot $match on fields that are not in the document (it works exactly the same way that find works) and if you use $project first then you will lose the ability to use indexes.

    What you can do instead is combine your efforts as follows:

    {
        aggregate: 'posts',
        pipeline: [
             {$match: {
                 created_date : 
                      {$gte:{$date:'2012-09-01T04:00:00Z'}, 
                      $lt:  {date:'2012-10-01T04:00:00Z'} 
                      }}
                 }
             },
             {$group:
                 {_id: '0',
                  totalComments:{$sum:'$comments'}
                 }
              }
        ]
     }
    

    The above only gives you aggregation for September, if you wanted to aggregate for multiple months, you can for example:

    {
        aggregate: 'posts',
        pipeline: [
             {$match: {
                 created_date : 
                      { $gte:'2012-07-01T04:00:00Z', 
                        $lt: '2012-10-01T04:00:00Z'
                      }
             },
             {$project: {
                  comments: 1,
                  new_created: {
                            "yr" : {"$year" : "$created_date"},
                            "mo" : {"$month" : "$created_date"}
                         }
                  }
             },
             {$group:
                 {_id: "$new_created",
                  totalComments:{$sum:'$comments'}
                 }
              }
        ]
     }
    

    and you'll get back something like:

    {
        "result" : [
            {
                "_id" : {
                    "yr" : 2012,
                    "mo" : 7
                },
                "totalComments" : 5
            },
            {
                "_id" : {
                    "yr" : 2012,
                    "mo" : 8
                },
                "totalComments" : 19
            },
            {
                "_id" : {
                    "yr" : 2012,
                    "mo" : 9
                },
                "totalComments" : 21
            }
        ],
        "ok" : 1
    }
    
    0 讨论(0)
  • 2021-02-12 16:48

    Let's look at building some pipelines that involve operations that are already familiar to us. So, we're going to look at the following stages:

    • match - this is filtering stage, similar to find.
    • project
    • sort
    • skip
    • limit

    We might ask ourself why these stages are necessary, given that this functionality is already provided in the MongoDB query language, and the reason is because we need these stages to support the more complex analytics-oriented functionality that's included with the aggregation framework. The below query is simply equal to a find:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, ])
    
    

    Let's introduce a project stage in this aggregation pipeline:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, {
      $project: {
        _id: 0,
        name: 1,
        founded_year: 1
      }
    }])
    
    

    We use aggregate method for implementing aggregation framework. The aggregation pipelines are merely an array of documents. Each of the document should stipulate a particular stage operator. So, in the above case we've an aggregation pipeline with two stages. The $match stage is passing the documents one at a time to $project stage.

    Let's extend to limit stage:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, {
      $limit: 5
    }, {
      $project: {
        _id: 0,
        name: 1
      }
    }])
    
    

    This gets the matching documents and limits to five before projecting out the fields. So, projection is working only on 5 documents. Assume, if we were to do something like this:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, {
      $project: {
        _id: 0,
        name: 1
      }
    }, {
      $limit: 5
    }])
    
    

    This gets the matching documents and projects those large number of documents and finally limits to five. So, projection is working on large number of documents and finally limiting to 5. This gives us a lesson that we should limit the documents to those which are absolutely necessary to be passed to the next stage. Now, let's look at sort stage:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, {
      $sort: {
        name: 1
      }
    }, {
      $limit: 5
    }, {
      $project: {
        _id: 0,
        name: 1
      }
    }])
    
    

    This will sort all documents by name and give only 5 out of them. Assume, if we were to do something like this:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, {
      $limit: 5
    }, {
      $sort: {
        name: 1
      }
    }, {
      $project: {
        _id: 0,
        name: 1
      }
    }])
    
    

    This will take first 5 documents and sort them. Let's add the skip stage:

    
    db.companies.aggregate([{
      $match: {
        founded_year: 2004
      }
    }, {
      $sort: {
        name: 1
      }
    }, {
      $skip: 10
    }, {
      $limit: 5
    }, {
      $project: {
        _id: 0,
        name: 1
      }
    }, ])
    
    

    This will sort all the documents and skip the initial 10 documents and return to us. We should try to include $match stages as early as possible in the pipeline. To filter documents using a $match stage, we use the same syntax for constructing query documents (filters) as we do for find().

    0 讨论(0)
提交回复
热议问题