Time Complexity of $addToset vs $push when element does not exist in the Array

后端未结

关注

 3  1594

Given: Connection is Safe=True so Update\'s return will contain update information.

Say I have a documents that look like:

[{\'a\': [1]}, {\'a\': [2]}, {


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-02-06 02:17
              
            
            
                                                                       
Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N)

https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp

if speed is really important then why not use a hash:

instead of:

{'$addToSet': {'a':1}}
{'$addToSet': {'a':10}}


use:

{$set: {'a.1': 1}
{$set: {'a.10': 1}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2021-02-06 02:28
              
            
            
                                                                       
Edit

Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.

The first query being:

coll.update({}, {'$addToSet': {'a':1}}, multi=True)


And the second being:

coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)


First problem springs to mind here, no indexes. $addToSet, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.

In reality you are looking for all documents that do not have 1 in a already and looking to $push the value 1 to that a array. 

So 2 points to the second query even before we get into time complexity here because the first query:


Does not use indexes
Would be a full table scan
Would then do a full array scan (with no index) to $addToSet


So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.

There is a problem to using big O notation to explain the time complexity of each query here:


I am unsure of what perspective you want, whether it is per document or for the whole collection.
I am unsure of indexes as such. Using indexes will actually create a Log algorithm on a however not using indexes does not.


However the first query would look something like: O(n) per document since:


The $addToSet would need to iterate over each element
The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.


Per collection, without the index it would be: O(2n2) since the complexity of iterating a will expodentially increase with every new document.

The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne would have the same problems as $addToSet without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a in then all documents without 1 in their set based upon the b-tree.

So based upon time complexity and the notes at the beginning I would say query 2 is better.

If I am honest I am not used to explaining in "Big O" Notation so this is experimental.

Hope it helps,
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2021-02-06 02:30
              
            
            
                                                                       
Adding my observation in difference between addToSet and push from bulk update of 100k documents.

when you are doing bulk update. addToSet will be executed separately.

for example,

bulkInsert.find({x:y}).upsert().update({"$set":{..},"$push":{ "a":"b" } , "$setOnInsert":  {} })


will first insert and set the document. And then it executes addToSet query. 

I saw clear difference of 10k between 

db.collection_name.count() #gives around 40k 

db.collection_name.count({"a":{$in:["b"]}}) # it gives only around 30k


But when replaced $addToSet with $push. both count query returned same value.

note: when you're not concerned about duplicate entry in array. you can go with $push.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复