Time Complexity of $addToset vs $push when element does not exist in the Array

后端 未结 3 1590
鱼传尺愫
鱼传尺愫 2021-02-06 01:59

Given: Connection is Safe=True so Update\'s return will contain update information.

Say I have a documents that look like:

[{\'a\': [1]}, {\'a\': [2]}, {         


        
3条回答
  •  长情又很酷
    2021-02-06 02:28

    Edit

    Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.

    The first query being:

    coll.update({}, {'$addToSet': {'a':1}}, multi=True)
    

    And the second being:

    coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)
    

    First problem springs to mind here, no indexes. $addToSet, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.

    In reality you are looking for all documents that do not have 1 in a already and looking to $push the value 1 to that a array.

    So 2 points to the second query even before we get into time complexity here because the first query:

    • Does not use indexes
    • Would be a full table scan
    • Would then do a full array scan (with no index) to $addToSet

    So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.

    There is a problem to using big O notation to explain the time complexity of each query here:

    • I am unsure of what perspective you want, whether it is per document or for the whole collection.
    • I am unsure of indexes as such. Using indexes will actually create a Log algorithm on a however not using indexes does not.

    However the first query would look something like: O(n) per document since:

    • The $addToSet would need to iterate over each element
    • The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.

    Per collection, without the index it would be: O(2n2) since the complexity of iterating a will expodentially increase with every new document.

    The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne would have the same problems as $addToSet without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a in then all documents without 1 in their set based upon the b-tree.

    So based upon time complexity and the notes at the beginning I would say query 2 is better.

    If I am honest I am not used to explaining in "Big O" Notation so this is experimental.

    Hope it helps,

提交回复
热议问题