MongoDB .NET Driver Group By Time Range

后端 未结 1 1982
别那么骄傲
别那么骄傲 2020-12-18 13:15

I am a noob in MongoDB and wanted to know how to count total documents inserted into the collection for every 15 minutes interval starting 12 AM UTC until the current UTC ti

1条回答
  •  囚心锁ツ
    2020-12-18 13:35

    If you are looking the the "exact thing" as the referenced post to do with .NET, then it probably isn't actually going to get implemented like that. You can do that, but you're probably not going to go to all the bother and actually go for one of the other alternatives, unless you have a need for "flexible intervals" to the extent that I do..

    Fluent Aggregate

    If you have a modern MongoDB 3.6 or greater server available then you can use $dateFromParts in order to reconstruct the date from the "rounded" parts extracted from the date:

    DateTime startDate = new DateTime(2018, 5, 1, 0, 0, 0, DateTimeKind.Utc);
    DateTime endDate = new DateTime(2018, 6, 1, 0, 0, 0, DateTimeKind.Utc);
    
    var result = Collection.Aggregate()
      .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
      .Group(k =>
        new DateTime(k.Timestamp.Year, k.Timestamp.Month, k.Timestamp.Day,
            k.Timestamp.Hour, k.Timestamp.Minute - (k.Timestamp.Minute % 15), 0),
        g => new { _id = g.Key, count = g.Count() }
      )
      .SortBy(d => d._id)
      .ToList();
    

    Statement sent to server:

    [
      { "$match" : {
        "Timestamp" : {
          "$gte" : ISODate("2018-05-01T00:00:00Z"),
          "$lt" : ISODate("2018-06-01T00:00:00Z")
        }
      } },
      { "$group" : {
        "_id" : { 
          "$dateFromParts" : {
            "year" : { "$year" : "$Timestamp" },
            "month" : { "$month" : "$Timestamp" },
            "day" : { "$dayOfMonth" : "$Timestamp" },
            "hour" : { "$hour" : "$Timestamp" },
            "minute" : { "$subtract" : [
              { "$minute" : "$Timestamp" },
              { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
            ] },
            "second" : 0
          }
        },
        "count" : { "$sum" : 1 }
      } },
      { "$sort": { "_id": 1 } }
    ]
    

    If you don't have that feature available, then you can simply leave that off and leave the date "disassembled", but then assemble it again as you process the cursor. Just to simulate with a list:

    var result = Collection.Aggregate()
     .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
     .Group(k => new
        {
          year = k.Timestamp.Year,
          month = k.Timestamp.Month,
          day = k.Timestamp.Day,
          hour = k.Timestamp.Hour,
          minute = k.Timestamp.Minute - (k.Timestamp.Minute % 15)
        },
        g => new { _id = g.Key, count = g.Count() }
      )
      .SortBy(d => d._id)
      .ToList();
    
    foreach (var doc in result)
    {
      //System.Console.WriteLine(doc.ToBsonDocument());
      System.Console.WriteLine(
        new BsonDocument {
          { "_id", new DateTime(doc._id.year, doc._id.month, doc._id.day,
            doc._id.hour, doc._id.minute, 0) },
          { "count", doc.count }
        }
      );
    }
    

    Statement sent to the server:

    [
      { "$match" : {
        "Timestamp" : {
          "$gte" : ISODate("2018-05-01T00:00:00Z"),
          "$lt" : ISODate("2018-06-01T00:00:00Z")
        }
      } },
      { "$group" : {
        "_id" : {
          "year" : { "$year" : "$Timestamp" },
          "month" : { "$month" : "$Timestamp" },
          "day" : { "$dayOfMonth" : "$Timestamp" },
          "hour" : { "$hour" : "$Timestamp" },
          "minute" : { "$subtract" : [
            { "$minute" : "$Timestamp" }, 
            { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
          ] }
        },
        "count" : { "$sum" : 1 }
      } },
      { "$sort" : { "_id" : 1 } }
    ]
    

    There's very little difference between the two in terms of the code. It's just that in one case the "casting back" to DateTime actually happens on the server with the $dateFromParts and in the other we just do that exact same casting using the DateTime constructor in code as you iterate each cursor result.

    So they really are almost the same with the only real difference being where the "server" does the casting the date returned uses a lot less bytes per document. In fact "5 times" less since all numeric formats here ( including the BSON Date ) are based on 64 bit integers. Even so, all of those numbers are still actually "lighter" than sending back any "string" representation of a date.

    LINQ Queryable

    Those are the basic forms which really stay the same when mapping onto these different forms:

    var query = from p in Collection.AsQueryable()
                where p.Timestamp >= startDate && p.Timestamp < endDate
                group p by new DateTime(p.Timestamp.Year, p.Timestamp.Month, p.Timestamp.Day,
                  p.Timestamp.Hour, p.Timestamp.Minute - (p.Timestamp.Minute % 15), 0) into g
                orderby g.Key
                select new { _id = g.Key, count = g.Count() };
    

    Statement sent to the server:

    [
      { "$match" : {
        "Timestamp" : {
          "$gte" : ISODate("2018-05-01T00:00:00Z"),
          "$lt" : ISODate("2018-06-01T00:00:00Z")
        }
      } },
      { "$group" : {
        "_id" : {
          "$dateFromParts" : {
            "year" : { "$year" : "$Timestamp" }, 
            "month" : { "$month" : "$Timestamp" },
            "day" : { "$dayOfMonth" : "$Timestamp" }, 
            "hour" : { "$hour" : "$Timestamp" }, 
            "minute" : { "$subtract" : [
              { "$minute" : "$Timestamp" },
              { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
            ] },
            "second" : 0
          }
        },
        "__agg0" : { "$sum" : 1 }
      } },
      { "$sort" : { "_id" : 1 } },
      { "$project" : { "_id" : "$_id", "count" : "$__agg0" } }
    ]
    

    Or using GroupBy()

    var query = Collection.AsQueryable()
        .Where(k => k.Timestamp >= startDate && k.Timestamp < endDate)
        .GroupBy(k =>
          new DateTime(k.Timestamp.Year, k.Timestamp.Month, k.Timestamp.Day,
                k.Timestamp.Hour, k.Timestamp.Minute - (k.Timestamp.Minute % 15), 0),
          (k, s) => new { _id = k, count = s.Count() }
        )
        .OrderBy(k => k._id);
    

    Statement sent to the server:

    [
      { "$match" : {
        "Timestamp" : {
          "$gte" : ISODate("2018-05-01T00:00:00Z"),
          "$lt" : ISODate("2018-06-01T00:00:00Z")
        }
      } },
      { "$group" : {
        "_id" : {
          "$dateFromParts" : {
            "year" : { "$year" : "$Timestamp" },
            "month" : { "$month" : "$Timestamp" },
            "day" : { "$dayOfMonth" : "$Timestamp" },
            "hour" : { "$hour" : "$Timestamp" },
            "minute" : { "$subtract" : [ 
              { "$minute" : "$Timestamp" }, 
              { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] } 
            ] },
            "second" : 0
          }
        },
        "count" : { "$sum" : 1 }
      } },
      { "$sort" : { "_id" : 1 } }
    ]
    

    As you can see it's all basically the same form


    Converting the Original

    If you are looking to replicate the original "date math" form as posted, then it currently falls beyond the scope of what you can actually do with either LINQ or the Fluent builders. The only way to get that same sequence is with BsonDocument construction:

    DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
    
    var group = new BsonDocument { {
      "$group",
      new BsonDocument {
        { "_id",
        new BsonDocument { {
          "$add", new BsonArray
          {
            new BsonDocument { {
                "$subtract",
                new BsonArray {
                  new BsonDocument { { "$subtract", new BsonArray { "$Timestamp", epoch } } },
                  new BsonDocument { {
                    "$mod", new BsonArray
                    {
                     new BsonDocument { { "$subtract", new BsonArray { "$Timestamp", epoch } } },
                     1000 * 60 * 15
                   }
                 } }
               }
             } },
             epoch
           }
         } }
         },
         {
           "count", new BsonDocument("$sum", 1)
         }
       }
    } };
    
    var query = sales.Aggregate()
      .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
      .AppendStage(group)
      .Sort(new BsonDocument("_id", 1))
      .ToList();
    

    Request sent to server:

    [
      { "$match" : {
        "Timestamp" : {
          "$gte" : ISODate("2018-05-01T00:00:00Z"),
          "$lt" : ISODate("2018-06-01T00:00:00Z")
        }
      } },
      { "$group" : {
        "_id" : { 
          "$add" : [
            { "$subtract" : [ 
              { "$subtract" : [ "$Timestamp", ISODate("1970-01-01T00:00:00Z") ] },
              { "$mod" : [ 
                { "$subtract" : [ "$Timestamp", ISODate("1970-01-01T00:00:00Z") ] },
                900000
              ] }
            ] },
            ISODate("1970-01-01T00:00:00Z")
          ]
        },
        "count" : { "$sum" : 1 }
      } },
      { "$sort" : { "_id" : 1 } }
    ]
    

    The big reason we cannot do this right now is because the current serialize of the statements basically disagrees of the point that the .NET Framework says that subtracting two DateTime values return a TimeSpan, and the MongoDB construct of subtracting two BSON Dates returns the "milliseconds since epoch", which is essentially how the math works.

    The "literal" translation of the lamdba expression is essentially:

    p =>  epoch.AddMilliseconds(
           (p.Timestamp - epoch).TotalMilliseconds
           - ((p.Timestamp - epoch).TotalMilliseconds % 1000 * 60 * 15))
    

    But the mapping still needs some work in order to either recognize the statements or formalize on which sort of statements are actually intended for this purpose.

    Notably MongoDB 4.0 introduces the $convert operator and the common aliases of $toLong and $toDate, which can all be used in the pipeline in place of the current handling on "addition" and "subtraction" with BSON Dates. These begin to form a more "formal" specification for such conversions rather than the method as shown which solely relied on that "addition" and "subtraction", which is still valid, but such named operators are much clearer of intent within the code:

    { "$group": {
      "_id": {
        "$toDate": {
          "$subtract": [
            { "$toLong": "$Timestamp" },
            { "$mod": [{ "$toLong": "$Timestamp" }, 1000 * 60 * 15 ] }
          ]
        }
      },
      "count": { "$sum": 1 }
    }}
    

    It's pretty plain to see that with "formalized" operators for statement construction with LINQ for such "DateToLong" and "LongToDate" functions, then the statement becomes much cleaner without the types of "coercions" shown in the "non working" lambda expression being done.

    0 讨论(0)
提交回复
热议问题