How can I select a number of records per a specific field using mongodb?

前端 未结 2 550
离开以前
离开以前 2021-01-21 02:36

I have a collection of documents in mongodb, each of which have a \"group\" field that refers to a group that owns the document. The documents look like this:

{         


        
相关标签:
2条回答
  • 2021-01-21 03:26

    you need aggregation framework $group stage piped in a $limit stage... you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)

    something like that: db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])

    here there is the documentation if you want to know more

    0 讨论(0)
  • 2021-01-21 03:34

    You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).

    So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.

    Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:

    map = function () { 
        emit(this.name, {a:[this]}); 
    }
    

    The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).

    reduce = function (key, values) {
        result={a:[]};
        values.forEach( function(v) {
            result.a = v.a.concat(result.a);
        } );
        return result;
    }
    

    Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.

    final = function (key, value) {
          Array.prototype.sortByProp = function(p){
           return this.sort(function(a,b){
           return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
          });
        }
    
        value.a.sortByProp('date');
        return value.a.slice(0,5);
    }
    

    Using a template document similar to one you provided, you run this by calling mapReduce command:

    > db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
    {
        "results" : [
            {
                "_id" : "group1",
                "value" : [
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe13"),
                        "name" : "group1",
                        "date" : ISODate("2013-04-17T20:07:59.498Z"),
                        "contents" : 0.23778377776034176
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
                        "name" : "group1",
                        "date" : ISODate("2013-04-17T20:07:59.467Z"),
                        "contents" : 0.4434165076818317
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe09"),
                        "name" : "group1",
                        "date" : ISODate("2013-04-17T20:07:59.436Z"),
                        "contents" : 0.5935856597498059
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe04"),
                        "name" : "group1",
                        "date" : ISODate("2013-04-17T20:07:59.405Z"),
                        "contents" : 0.3912118375301361
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfdff"),
                        "name" : "group1",
                        "date" : ISODate("2013-04-17T20:07:59.372Z"),
                        "contents" : 0.221651989268139
                    }
                ]
            },
            {
                "_id" : "group2",
                "value" : [
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe14"),
                        "name" : "group2",
                        "date" : ISODate("2013-04-17T20:07:59.504Z"),
                        "contents" : 0.019611883210018277
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
                        "name" : "group2",
                        "date" : ISODate("2013-04-17T20:07:59.473Z"),
                        "contents" : 0.5670706110540777
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
                        "name" : "group2",
                        "date" : ISODate("2013-04-17T20:07:59.442Z"),
                        "contents" : 0.893193120136857
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe05"),
                        "name" : "group2",
                        "date" : ISODate("2013-04-17T20:07:59.411Z"),
                        "contents" : 0.9496864483226091
                    },
                    {
                        "_id" : ObjectId("516f011fbfd3e39f184cfe00"),
                        "name" : "group2",
                        "date" : ISODate("2013-04-17T20:07:59.378Z"),
                        "contents" : 0.013748752186074853
                    }
                ]
            },
            {
                "_id" : "group3",
                            ...
                    }
                ]
            }
        ],
        "timeMillis" : 15,
        "counts" : {
            "input" : 80,
            "emit" : 80,
            "reduce" : 5,
            "output" : 5
        },
        "ok" : 1,
    }
    

    Each result has _id as group name and values as array of most recent five documents from the collection for that group name.

    0 讨论(0)
提交回复
热议问题