Multiple Fields Where Keys In Document Vary Average Aggregation

后端未结

关注

 1  532

I got dataset as follow :

{
    \"_id\" : ObjectId(\"592d4f43d69b643ac0cb9148\"),
    \"timestamp\" : ISODate(\"2017-03-01T16:58:00.000Z\"),
    \"Technique-Met


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2021-01-23 09:38
              
            
            
                                                                       
Outline of Concept

What I was basically saying in the very brief comment is that instead for issuing a separate aggregation query for every sensor "key" name, you can put it in ONE, as long as you calculate the "averages" correctly.

Of course the problem in your data is that the "keys" are not present in all documents. So to get the correct "average", we cannot just use $avg since it would count "ALL" documents, whether the key was present or not.

So instead we break up the "math", and do a $group for the Total Count and total Sum of each key first. This uses $ifNull to test for the presence of the field, and also $cond to alternate values to return.

.aggregate([
  { "$match": {
    "$or": [
      { "Technique-Electrique_VMC Aldes_Power4[W]": { "$exists": True } },
      { "Technique-Electrique_VMC Unelvent_Power5[W]": { "$exists": True } }
    ]
  }}
  { "$group":{
    "_id":{
      "year":{ "$year":"$timestamp" },
      "month":{ "$month":"$timestamp" }
    },
    "Technique-Electrique_VMC Aldes_Power4[W]-Sum": { 
      "$sum": { 
        "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", 0 ]
      }
    },
    "Technique-Electrique_VMC Aldes_Power4[W]-Count": { 
      "$sum": { 
        "$cond": [
          { "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", false ] },
          1,
          0
        ]
      }
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Sum": {
      "$sum": { 
        "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", 0 ]
      }
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Count": {
      "$sum": {
        "$cond": [ 
          { "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", false ] },
          1,
          0
        ]
      }
    }
  }},
  { "$project": {
    "Technique-Electrique_VMC Aldes_Power4[W]-Avg": {
      "$divide": [
        "$Technique-Electrique_VMC Aldes_Power4[W]-Sum",
        "$Technique-Electrique_VMC Aldes_Power4[W]-Count"
      ]
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Avg": {
      "$divide": [
        "Technique-Electrique_VMC Unelvent_Power5[W]-Sum",
        "Technique-Electrique_VMC Unelvent_Power5[W]-Count"
      ]
    }
  }}
])


The $cond operator is a "ternary" operator which means where the first "if" condition is true, "then" the second argument is returned, "else" the third argument is returned.

So the point of the ternary in the "Count" is to work out:


If the field is there then return 1 for count
Otherwise return 0 when it is not there


After the $group is done, in order to get the Average we use $divide on the two numbers produced for each key within a separate $project stage.

The end result is the "average" for every key that you supply, and this considered only adding values and counts for documents where the field was actually present.

So putting all the keys in the one aggregation statement will save you a lot of time and resources on processing.



Dynamic Generation of Pipeline

So to do this "dynamically" in python, start with the list:

sensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]

match = { '$match': { '$or': map(lambda x: { x: { '$exists': True } },sensors) } }

group = { '$group': { 
  '_id': {
    'year': { '$year': '$timestamp' },
    'month': { '$month':'$timestamp' }
  }
}}

project = { '$project': {  } }

for k in sensors:
  group['$group'][k + '-Sum'] = {
    '$sum': { '$ifNull': [ '$' + k, 0 ] }
  }
  group['$group'][k + '-Count'] = {
    '$sum': { '$cond': [ { '$ifNull': [ '$' + k, False ] }, 1, 0 ]  }
  }
  project['$project'][k + '-Avg'] = {
    '$divide': [ '$' + k + '-Sum', '$' + k + '-Count' ]
  }

pipeline = [match,group,project]


Which generates the same as the full listing above for a given list of "sensors".
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复