How to query documents in mongodb (pymongo) where all keywords exist in a field?

怎甘沉沦 提交于 2019-12-13 05:25:22

问题


I have a list of keywords:

keywords = ['word1', 'word2', 'word3']

For now I query for only 1 keyword like this:

collection.find({'documenttextfield': {'$regex': ' '+keyword+' '}})

I'm in no way a guru in regex so i do the reggae with spaces on the side of the keyword to find exact match.

But what i want now is, having that keywords list, to query the documents and find those which have each of the keywords from the list in the documenttextfield.

I have some ideas of how to do this, but they are all a bit too complex and I feel I'm missing something...


回答1:


Consider using a text index with a $text search. It might be a far better solution than using regular expressions. However, text search returns documents based on a scoring-algorithm, so you might get some results which don't have all the keywords you are looking for.

If you can't or don't want to add a text index to this field, using a single regular expression would be quite a pain because you don't know the order in which these words appear. I don't claim it is impossible to write, but you will end up with a horrible abomination even for regex standards. It would be far easier to use the regex operator multiple time by using the $and operator.

Also, using a space as delimeter is going to fail when the word is at the beginning or end of the string or followed by a period or comma. Use the word-boundary token (\b) instead.

collection.find(
    { $and : [
              {'documenttextfield': {'$regex': '\b' +keyword1+'\b'}},
              {'documenttextfield': {'$regex': '\b' +keyword2+'\b'}},
              {'documenttextfield': {'$regex': '\b' +keyword3+'\b'}},
         ]
    });

Keep in mind that this is a really slow query, because it will run these three regular expressions on every single document of the collection. When this is a performance-critical query, seriously consider if a text index really won't do. Failing this, the last straw to grasp would be to extract any keywords from the documenttextfield field someone could search for (which might be every unique word in it) into a new array-field documenttextfield_keywords, create a normal index on that field, and search on that field with the $all operator (no regular expression required in that case).



来源:https://stackoverflow.com/questions/36251768/how-to-query-documents-in-mongodb-pymongo-where-all-keywords-exist-in-a-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!