问题
I have a list of keywords:
keywords = ['word1', 'word2', 'word3']
For now I query for only 1 keyword like this:
collection.find({'documenttextfield': {'$regex': ' '+keyword+' '}})
I'm in no way a guru in regex so i do the reggae with spaces on the side of the keyword to find exact match.
But what i want now is, having that keywords
list, to query the documents and find those which have each of the keywords from the list in the documenttextfield
.
I have some ideas of how to do this, but they are all a bit too complex and I feel I'm missing something...
回答1:
Consider using a text index with a $text search. It might be a far better solution than using regular expressions. However, text search returns documents based on a scoring-algorithm, so you might get some results which don't have all the keywords you are looking for.
If you can't or don't want to add a text index to this field, using a single regular expression would be quite a pain because you don't know the order in which these words appear. I don't claim it is impossible to write, but you will end up with a horrible abomination even for regex standards. It would be far easier to use the regex operator multiple time by using the $and
operator.
Also, using a space as delimeter is going to fail when the word is at the beginning or end of the string or followed by a period or comma. Use the word-boundary token (\b
) instead.
collection.find(
{ $and : [
{'documenttextfield': {'$regex': '\b' +keyword1+'\b'}},
{'documenttextfield': {'$regex': '\b' +keyword2+'\b'}},
{'documenttextfield': {'$regex': '\b' +keyword3+'\b'}},
]
});
Keep in mind that this is a really slow query, because it will run these three regular expressions on every single document of the collection. When this is a performance-critical query, seriously consider if a text index really won't do. Failing this, the last straw to grasp would be to extract any keywords from the documenttextfield
field someone could search for (which might be every unique word in it) into a new array-field documenttextfield_keywords
, create a normal index on that field, and search on that field with the $all operator (no regular expression required in that case).
来源:https://stackoverflow.com/questions/36251768/how-to-query-documents-in-mongodb-pymongo-where-all-keywords-exist-in-a-field