Mongo aggregation vs Java for loop and performance

问题

I have a below mongo document stored

{
 "Field1": "ABC",
 "Field2": [
    { "Field3": "ABC1","Field4": [ {"id": "123" }, { "id" : "234" }, { "id":"345" }] }, 
    { "Field3": "ABC2","Field4": [ {"id": "123" }, { "id" : "234" }, { "id":"345" }] }, 
    { "Field3": "ABC3","Field4": [{ "id":"345" }] },  
    ]
}

from the above, I want to fetch the subdocuments which is having id "123"

ie.

{ 
    "Field3" : "ABC1",
    "Field4" : [ { "id": "123"} ]
} ,
{
    "Field3" : "ABC2",
    "Field4" : [ { "id": "123"} ]
}

1. Java way  
    A. use Mongo find method to get the ABC document from Mongo DB
    B. for Loop to Iterate the Field2 Json Array
    C. Again for Loop to Iterate over Field4 Json Array
    D. Inside the nested for loop I've if condition to Match id value to "123"
    E. Store the Matching subdocument into List 

2. Mongo Way
   A. Use Aggregation query to get the desired output from DB.No Loops and conditions in the Java side. 
   B. Aggregation Query below stages
    I)  $Match - match the ABC document
    II) $unwind - Field2
    III) $unwind - Field4
    IV) $match - Match the with id ( value is "123")
    V) $group - group the document  based on Field3 (based on "ABC1" or "ABC2") 
    VI) execute aggregation and return results

Both are working good and returning proper results.
Question is which one is the better to follow and why ? I used the aggregation in restful service get method, So executing aggregation queries 1000 or more times in parallel will cause any performance problems?

回答1:

With Aggregation, the whole query is executed as a single process on the MongoDB server - the application program will get the results cursor from the server.

With Java program also you are getting a cursor from the database server as input to the processing in the application. The response cursor from the server is going to be larger set of data and will use more network bandwidth. And then there is processing in the application program, and this adds more steps to complete the query.

I think the aggregation option is a better choice - as all the processing (the initial match and filtering the array) happens on the database server as a single process.

Also, note the aggregation query steps you had posted can be done in an efficient way. Instead of multiple stages (2, 3, 4 and 5) you can do those operations in a two stages - use a $project with $map on the outer array and then $filter on the inner array and then $filter the outer array.

The aggregation:

db.test.aggregate( [
  { 
      $addFields: { 
          Field2: { 
              $map: {
                   input: "$Field2",
                      as: "fld2",
                      in: {
                           Field3: "$$fld2.Field3",
                           Field4: { 
                               $filter: {
                                   input: "$$fld2.Field4",
                                      as: "fld4",
                                    cond: {  $eq: [ "$$fld4.id", "123" ] }
                               }
                           }
                       }
                 } 
          }
      }
  },
  { 
      $addFields: { 
          Field2: { 
              $filter: {
                   input: "$Field2",
                      as: "f2",
                    cond: {  $gt: [ { $size: "$$f2.Field4" }, 0 ] }
              }
          }
      }
  },
] )

回答2:

The second way is probably better because it returns a smaller result from the datastore; shlepping bits over the wire is expensive.

来源：https://stackoverflow.com/questions/59508523/mongo-aggregation-vs-java-for-loop-and-performance

标签

mongodb

aggregation-framework

spring-mongodb