Time Complexity of $addToset vs $push when element does not exist in the Array

后端 未结 3 1594
鱼传尺愫
鱼传尺愫 2021-02-06 01:59

Given: Connection is Safe=True so Update\'s return will contain update information.

Say I have a documents that look like:

[{\'a\': [1]}, {\'a\': [2]}, {         


        
相关标签:
3条回答
  • 2021-02-06 02:17

    Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N)

    https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp

    if speed is really important then why not use a hash:

    instead of:

    {'$addToSet': {'a':1}}
    {'$addToSet': {'a':10}}
    

    use:

    {$set: {'a.1': 1}
    {$set: {'a.10': 1}
    
    0 讨论(0)
  • 2021-02-06 02:28

    Edit

    Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.

    The first query being:

    coll.update({}, {'$addToSet': {'a':1}}, multi=True)
    

    And the second being:

    coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)
    

    First problem springs to mind here, no indexes. $addToSet, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.

    In reality you are looking for all documents that do not have 1 in a already and looking to $push the value 1 to that a array.

    So 2 points to the second query even before we get into time complexity here because the first query:

    • Does not use indexes
    • Would be a full table scan
    • Would then do a full array scan (with no index) to $addToSet

    So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.

    There is a problem to using big O notation to explain the time complexity of each query here:

    • I am unsure of what perspective you want, whether it is per document or for the whole collection.
    • I am unsure of indexes as such. Using indexes will actually create a Log algorithm on a however not using indexes does not.

    However the first query would look something like: O(n) per document since:

    • The $addToSet would need to iterate over each element
    • The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.

    Per collection, without the index it would be: O(2n2) since the complexity of iterating a will expodentially increase with every new document.

    The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne would have the same problems as $addToSet without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a in then all documents without 1 in their set based upon the b-tree.

    So based upon time complexity and the notes at the beginning I would say query 2 is better.

    If I am honest I am not used to explaining in "Big O" Notation so this is experimental.

    Hope it helps,

    0 讨论(0)
  • 2021-02-06 02:30

    Adding my observation in difference between addToSet and push from bulk update of 100k documents.

    when you are doing bulk update. addToSet will be executed separately.

    for example,

    bulkInsert.find({x:y}).upsert().update({"$set":{..},"$push":{ "a":"b" } , "$setOnInsert":  {} })
    

    will first insert and set the document. And then it executes addToSet query.

    I saw clear difference of 10k between

    db.collection_name.count() #gives around 40k 
    
    db.collection_name.count({"a":{$in:["b"]}}) # it gives only around 30k
    

    But when replaced $addToSet with $push. both count query returned same value.

    note: when you're not concerned about duplicate entry in array. you can go with $push.

    0 讨论(0)
提交回复
热议问题