MongoDB sum arrays from multiple documents on a per-element basis

前端 未结 2 1558
旧时难觅i
旧时难觅i 2021-01-14 00:16

I have the following document structure (simplified for this example)

{
  _id : ObjectId(\"sdfsdf\"),
  result : [1, 3         


        
相关标签:
2条回答
  • 2021-01-14 00:53

    You can use includeArrayIndex if you have 3.2 or newer MongoDb.

    Then you should change $unwind.

    Your code should be like this:

    .aggregate(
        [
          {
            "$unwind" :  { path: "$result", includeArrayIndex: "arrayIndex" }
          },
          {
            "$group": {
              "_id": "$arrayIndex",
              "results" : { "$sum" : "$result"}
              }
          },
          { 
            $sort: { "_id": 1}
          },
          {
            "$group":{
              "_id": null,
              "results":{"$push":"$results"}
              } 
          },
          {
            "$project": {"_id":0,"results":1}
          }
        ]
    )
    
    0 讨论(0)
  • 2021-01-14 01:11

    There is an alternate approach to this, but mileage may vary on how practical it is considering that a different approach would involve using $push to create an "array of arrays" and then applying $reduce as introduced in MongoDB 3.4 to $sum those array elements into a single array result:

    db.collection.aggregate([
      { "$group": {
        "_id": null,
        "result": { "$push": "$result" }
      }},
      { "$addFields": {
        "result": {
          "$reduce": {
            "input": "$result",
            "initialValue": [],
            "in": {
              "$map": {
                "input": { 
                  "$zip": {
                    "inputs": [ "$$this", "$$value" ],
                    "useLongestLength": true
                  }
                },
                "as": "el",
                "in": { "$sum": "$$el" }
              }
            }
          }
        }
      }}
    ])
    

    The real trick there is in the "input" to $map we use the $zip operation which creates a transposed list of arrays "pairwise" for the two array inputs.

    In a first iteration this takes the empty array as supplied to $reduce and would return the "zipped" output with consideration to the first object found as in:

    [ [0,1], [0,3], [0,5], [0,7], [0,9] ]
    

    So the useLongestLength would substitute the empty array with 0 values out to the the length of the current array and "zip" them together as above.

    Processing with $map, each element is subject to $sum which "reduces" the returned results as:

    [ 1, 3, 5, 7, 9 ]
    

    On the second iteration, the next entry in the "array of arrays" would be picked up and processed by $zip along with the previous "reduced" content as:

    [ [1,2], [3,4], [5,6], [7,8], [9,10] ]
    

    Which is then subject to the $map for each element using $sum again to produce:

    [ 3, 7, 11, 15, 19 ]
    

    And since there were only two arrays pushed into the "array of arrays" that is the end of the operation, and the final result. But otherwise the $reduce would keep iterating until all array elements of the input were processed.


    So in some cases this would be the more performant option and what you should be using. But it is noted that particularly when using a null for $group you are asking "every" document to $push content into an array for the result.

    This could be a cause of breaking the BSON Limit in extreme cases, and therefore when aggregating positional array content over large results, it is probably best to use $unwind with the includeArrayIndex option instead.

    Or indeed actually take a good look at the process, where in particular if the "positional array" in question is actually the result of some other "aggregation operation", then you should rather be looking at the previous pipeline stages that were used to create the "positional array". And then consider that if you wanted those positions "aggregated further" to new totals, then you should in fact do that "before" the positional result was obtained.

    0 讨论(0)
提交回复
热议问题