Given: Connection is Safe=True so Update\'s return will contain update information.
Say I have a documents that look like:
[{\'a\': [1]}, {\'a\': [2]}, {
Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N)
https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp
if speed is really important then why not use a hash:
instead of:
{'$addToSet': {'a':1}}
{'$addToSet': {'a':10}}
use:
{$set: {'a.1': 1}
{$set: {'a.10': 1}
Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.
The first query being:
coll.update({}, {'$addToSet': {'a':1}}, multi=True)
And the second being:
coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)
First problem springs to mind here, no indexes. $addToSet
, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.
In reality you are looking for all documents that do not have 1
in a
already and looking to $push
the value 1
to that a
array.
So 2 points to the second query even before we get into time complexity here because the first query:
$addToSet
So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.
There is a problem to using big O notation to explain the time complexity of each query here:
a
however not using indexes does not.However the first query would look something like: O(n) per document since:
Per collection, without the index it would be: O(2n2) since the complexity of iterating a
will expodentially increase with every new document.
The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne
would have the same problems as $addToSet
without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a
in then all documents without 1
in their set based upon the b-tree.
So based upon time complexity and the notes at the beginning I would say query 2 is better.
If I am honest I am not used to explaining in "Big O" Notation so this is experimental.
Hope it helps,
Adding my observation in difference between addToSet and push from bulk update of 100k documents.
when you are doing bulk update. addToSet will be executed separately.
for example,
bulkInsert.find({x:y}).upsert().update({"$set":{..},"$push":{ "a":"b" } , "$setOnInsert": {} })
will first insert and set the document. And then it executes addToSet query.
I saw clear difference of 10k between
db.collection_name.count() #gives around 40k
db.collection_name.count({"a":{$in:["b"]}}) # it gives only around 30k
But when replaced $addToSet with $push. both count query returned same value.
note: when you're not concerned about duplicate entry in array. you can go with $push.