Mongo aggregation within intervals of time

前端 未结 3 955
孤独总比滥情好
孤独总比滥情好 2021-02-04 20:57

I have some log data stored in a mongo collection that includes basic information as a request_id and the time it was added to the collection, for example:

{
            


        
相关标签:
3条回答
  • 2021-02-04 21:18

    There are a couple of ways of approaching this depending on which output format best suits your needs. The main note is that with the "aggregation framework" itself, you cannot actually return something "cast" as a date, but you can get values that are easily reconstructed into a Date object when processing results in your API.

    The first approach is to use the "Date Aggregation Operators" available to the aggregation framework:

    db.collection.aggregate([
        { "$match": {
            "time": { "$gte": startDate, "$lt": endDate }
        }},
        { "$group": {
            "_id": {
                "year": { "$year": "$time" },
                "dayOfYear": { "$dayOfYear": "$time" },
                "hour": { "$hour": "$time" },
                "minute": {
                    "$subtract": [
                        { "$minute": "$time" },
                        { "$mod": [ { "$minute": "$time" }, 10 ] }
                    ]
                }
            },
            "count": { "$sum": 1 }
        }}
    ])
    

    Which returns a composite key for _id containing all the values you want for a "date". Alternately if just within an "hour" always then just use the "minute" part and work out the actual date based on the startDate of your range selection.

    Or you can just use plain "Date math" to get the milliseconds since "epoch" which can again be fed to a date contructor directly.

    db.collection.aggregate([
        { "$match": {
            "time": { "$gte": startDate, "$lt": endDate }
        }},
        { "$group": {
            "_id": {
                "$subtract": [
                   { "$subtract": [ "$time", new Date(0) ] },
                   { "$mod": [
                       { "$subtract": [ "$time", new Date(0) ] },
                       1000 * 60 * 10
                   ]}
                ]
            },
            "count": { "$sum": 1 }
        }}
    ])
    

    In all cases what you do not want to do is use $project before actually applying $group. As a "pipeline stage", $project must "cycle" though all documents selected and "transform" the content.

    This takes time, and adds to the execution total of the query. You can simply just apply to the $group directly as has been shown.

    Or if you are really "pure" about a Date object being returned without post processing, then you can always use "mapReduce", since the JavaScript functions actually allow recasting as a date, but slower than the aggregation framework and of course without a cursor response:

    db.collection.mapReduce(
       function() {
           var date = new Date(
               this.time.valueOf() 
               - ( this.time.valueOf() % ( 1000 * 60 * 10 ) )
           );
           emit(date,1);
       },
       function(key,values) {
           return Array.sum(values);
       },
       { "out": { "inline": 1 } }
    )
    

    Your best bet is using aggregation though, as transforming the response is quite easy:

    db.collection.aggregate([
        { "$match": {
            "time": { "$gte": startDate, "$lt": endDate }
        }},
        { "$group": {
            "_id": {
                "year": { "$year": "$time" },
                "dayOfYear": { "$dayOfYear": "$time" },
                "hour": { "$hour": "$time" },
                "minute": {
                    "$subtract": [
                        { "$minute": "$time" },
                        { "$mod": [ { "$minute": "$time" }, 10 ] }
                    ]
                }
            },
            "count": { "$sum": 1 }
        }}
    ]).forEach(function(doc) {
        doc._id = new Date(doc._id);
        printjson(doc);
    })
    

    And then you have your interval grouping output with real Date objects.

    0 讨论(0)
  • 2021-02-04 21:29

    a pointer in lieu of a concrete answer. you can very easily do it for minutes, hours and given periods using the date aggregations . every 10 minutes will be a bit trickier but likely possible with some wrangling. nevertheless, the aggregation will be slow as nuts on large data sets.

    i would suggest extracting the minutes post-insert

    {
        "_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
        "request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
        "time" : ISODate("2015-07-21T16:00:00.00Z"),
        "minutes": 16
    }
    

    and even though it sounds utterly absurd adding quartiles and sextiles or whatever that N might be.

    {
        "_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
        "request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
        "time" : ISODate("2015-07-21T16:00:00.00Z"),
        "minutes": 16,
        "quartile: 1,
        "sextile: 2,
    }
    

    first try doing a $div on the minutes. doesnt do ceil and floor. but check out

    Is there a floor function in Mongodb aggregation framework?

    0 讨论(0)
  • 2021-02-04 21:39

    Something like this?

    pipeline = [
        {"$project":
            {"date": {
                "year": {"$year": "$time"},
                "month": {"$month": "$time"},
                "day": {"$dayOfMonth": "$time"},
                "hour": {"$hour": "$time"},
                "minute": {"$subtract": [
                    {"$minute": "$time"},
                    {"$mod": [{"$minute": "$time"}, 10]}
                ]}
            }}
        },
        {"$group": {"_id": "$date", "count": {"$sum": 1}}}
    ]
    

    Example:

    > db.foo.insert({"time": new Date(2015,  7, 21, 22, 21)})
    > db.foo.insert({"time": new Date(2015,  7, 21, 22, 23)})
    > db.foo.insert({"time": new Date(2015,  7, 21, 22, 45)})
    > db.foo.insert({"time": new Date(2015,  7, 21, 22, 33)})
    > db.foo.aggregate(pipeline)
    

    and output:

    { "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 40 }, "count" : 1 }
    { "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 20 }, "count" : 2 }
    { "_id" : { "year" : 2015, "month" : 8, "day" : 21, "hour" : 20, "minute" : 30 }, "count" : 1 }
    
    0 讨论(0)
提交回复
热议问题