Time Complexity of $addToset vs $push when element does not exist in the Array

后端未结

关注

 3  1596

鱼传尺愫 2021-02-06 01:59

Given: Connection is Safe=True so Update\'s return will contain update information.

Say I have a documents that look like:

[{\'a\': [1]}, {\'a\': [2]}, {


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   长情又很酷
                                             
                
                
                (楼主)
            
              
              
                2021-02-06 02:28
              

            
            
                        
Edit

Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.

The first query being:

coll.update({}, {'$addToSet': {'a':1}}, multi=True)


And the second being:

coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)


First problem springs to mind here, no indexes. $addToSet, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.

In reality you are looking for all documents that do not have 1 in a already and looking to $push the value 1 to that a array. 

So 2 points to the second query even before we get into time complexity here because the first query:


Does not use indexes
Would be a full table scan
Would then do a full array scan (with no index) to $addToSet


So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.

There is a problem to using big O notation to explain the time complexity of each query here:


I am unsure of what perspective you want, whether it is per document or for the whole collection.
I am unsure of indexes as such. Using indexes will actually create a Log algorithm on a however not using indexes does not.


However the first query would look something like: O(n) per document since:


The $addToSet would need to iterate over each element
The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.


Per collection, without the index it would be: O(2n2) since the complexity of iterating a will expodentially increase with every new document.

The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne would have the same problems as $addToSet without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a in then all documents without 1 in their set based upon the b-tree.

So based upon time complexity and the notes at the beginning I would say query 2 is better.

If I am honest I am not used to explaining in "Big O" Notation so this is experimental.

Hope it helps,
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复