MongoDB lists - get every Nth item

前端 未结 6 2377
太阳男子
太阳男子 2021-02-13 05:36

I have a Mongodb schema that looks roughly like:

[
  {
    \"name\" : \"name1\",
    \"instances\" : [ 
      {
        \"value\" : 1,
        \"date\" : ISODate         


        
相关标签:
6条回答
  • 2021-02-13 05:46

    You might like this approach using the $lookup aggregation. And probably the most convenient and fastest way without any aggregation trick.

    Create a collection Names with the following schema

    [
      { "_id": 1, "name": "name1" },
      { "_id": 2, "name": "name2" }
    ]
    

    and then Instances collection having the parent id as "nameId"

    [
      { "nameId": 1, "value" : 1, "date" : ISODate("2015-03-04T00:00:00.000Z") },
      { "nameId": 1, "value" : 2, "date" : ISODate("2015-04-01T00:00:00.000Z") },
      { "nameId": 1, "value" : 3, "date" : ISODate("2015-03-05T00:00:00.000Z") },
      { "nameId": 2, "value" : 7, "date" : ISODate("2015-03-04T00:00:00.000Z") }, 
      { "nameId": 2, "value" : 8, "date" : ISODate("2015-04-01T00:00:00.000Z") }, 
      { "nameId": 2, "value" : 4, "date" : ISODate("2015-03-05T00:00:00.000Z") }
    ]
    

    Now with $lookup aggregation 3.6 syntax you can use $sample inside the $lookup pipeline to get the every Nth element randomly.

    db.Names.aggregate([
      { "$lookup": {
        "from": Instances.collection.name,
        "let": { "nameId": "$_id" },
        "pipeline": [
          { "$match": { "$expr": { "$eq": ["$nameId", "$$nameId"] }}},
          { "$sample": { "size": N }}
        ],
        "as": "instances"
      }}
    ])
    

    You can test it here

    0 讨论(0)
  • 2021-02-13 05:47

    Or with just a find block:

    db.Collection.find({}).then(function(data) {
      var ret = [];
      for (var i = 0, len = data.length; i < len; i++) {
        if (i % 3 === 0 ) {
          ret.push(data[i]);
        }
      }
      return ret;
    });
    

    Returns a promise whose then() you can invoke to fetch the Nth modulo'ed data.

    0 讨论(0)
  • 2021-02-13 05:57

    No $unwind is needed here. You can use $push with $arrayElemAt to project the array value at requested index inside $group aggregation.

    Something like

    db.colname.aggregate(
    [
      {"$group":{
        "_id":null,
        "valuesatNthindex":{"$push":{"$arrayElemAt":["$instances",N]}
        }}
      },
      {"$project":{"valuesatNthindex":1}}
    ])
    
    0 讨论(0)
  • 2021-02-13 06:02

    Unfortunately, with the aggregation framework it's not possible as this would require an option with $unwind to emit an array index/position, of which currently aggregation can't handle. There is an open JIRA ticket for this here SERVER-4588.

    However, a workaround would be to use MapReduce but this comes at a huge performance cost since the actual calculations of getting the array index are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.

    With mapReduce, you could try something like this:

    Mapping function:

    var map = function(){
        for(var i=0; i < this.instances.length; i++){
            emit(
                { "_id": this._id,  "index": i },
                { "index": i, "value": this.instances[i] }
            );
        }
    };
    

    Reduce function:

    var reduce = function(){}
    

    You can then run the following mapReduce function on your collection:

    db.collection.mapReduce( map, reduce, { out : "resultCollection" } );
    

    And then you can query the result collection to geta list/array of every Nth item of the instance array by using the map() cursor method :

    var thirdInstances = db.resultCollection.find({"_id.index": N})
                                            .map(function(doc){return doc.value.value})
    
    0 讨论(0)
  • 2021-02-13 06:06

    It seems that your question clearly asked "get every nth instance" which does seem like a pretty clear question.

    Query operations like .find() can really only return the document "as is" with the exception of general field "selection" in projection and operators such as the positional $ match operator or $elemMatch that allow a singular matched array element.

    Of course there is $slice, but that just allows a "range selection" on the array, so again does not apply.

    The "only" things that can modify a result on the server are .aggregate() and .mapReduce(). The former does not "play very well" with "slicing" arrays in any way, at least not by "n" elements. However since the "function()" arguments of mapReduce are JavaScript based logic, then you have a little more room to play with.

    For analytical processes, and for analytical purposes "only" then just filter the array contents via mapReduce using .filter():

    db.collection.mapReduce(
        function() {
            var id = this._id;
            delete this._id;
    
            // filter the content of "instances" to every 3rd item only
            this.instances = this.instances.filter(function(el,idx) {
                return ((idx+1) % 3) == 0;
            });
            emit(id,this);
        },
        function() {},
        { "out": { "inline": 1 } } // or output to collection as required
    )
    

    It's really just a "JavaScript runner" at this point, but if this is just for anaylsis/testing then there is nothing generally wrong with the concept. Of course the output is not "exactly" how your document is structured, but it's as near a facsimile as mapReduce can get.

    The other suggestion I see here requires creating a new collection with all the items "denormalized" and inserting the "index" from the array as part of the unqique _id key. That may produce something you can query directly, bu for the "every nth item" you would still have to do:

    db.resultCollection.find({
         "_id.index": { "$in": [2,5,8,11,14] } // and so on ....
    })
    

    So work out and provide the index value of "every nth item" in order to get "every nth item". So that doesn't really seem to solve the problem that was asked.

    If the output form seemed more desirable for your "testing" purposes, then a better subsequent query on those results would be using the aggregation pipeline, with $redact

    db.newCollection([
        { "$redact": {
            "$cond": {
                "if": {
                    "$eq": [ 
                        { "$mod": [ { "$add": [ "$_id.index", 1] }, 3 ] },
                    0 ]
                },
                "then": "$$KEEP",
                "else": "$$PRUNE"
            }
        }}
    ])
    

    That at least uses a "logical condition" much the same as what was applied with .filter() before to just select the "nth index" items without listing all possible index values as a query argument.

    0 讨论(0)
  • 2021-02-13 06:07

    You can use below aggregation:

    db.col.aggregate([
        {
            $project: {
                instances: {
                    $map: {
                        input: { $range: [ 0, { $size: "$instances" }, N ] },
                        as: "index",
                        in: { $arrayElemAt: [ "$instances", "$$index" ] }
                    }
                }
            }
        }
    ])
    

    $range generates a list of indexes. Third parameter represents non-zero step. For N = 2 it will be [0,2,4,6...], for N = 3 it will return [0,3,6,9...] and so on. Then you can use $map to get correspinding items from instances array.

    0 讨论(0)
提交回复
热议问题