问题
I would like to make a search on a collection in my mongodb database. In my collection, I have documents with the field "name" can be values like:
[i] "Palácio Guanabara", "Palácio da Cidade", "Festa Palácio", etc.
When a user types a search like "pala" or "palá" or "Pala" or "PalÁ", all those itens in [i] must build the result set.
I found that in MongoDB I could use regex in searches, like:
{ "name": { $regex: new Regex(".*pala.*", "i") } }
Ok, this approach is case insensitive and use the percent like logic from SQL ("%pala%"). But, it isn't ignore accents from the register in database.
I found another alternative with the $text index: https://docs.mongodb.org/manual/core/index-text/
This approach can ignore case sensitive and accents. But the "search" does not accepts a regex, so I can't search things like "%pala%".
Summing up, I want to make the following SQL query in MongoDB:
select * from collection where remove_accents(upper(name)) like '%Pala%'
And this query returning results with name like "palácio", "palacio", "PaláCiô", etc.
回答1:
There is no magic bullet here inside of MongoDb. But since you obviously changing user input anyway to create '%pala%', why not replace "a" with "[aá]" and wrap in ".*", this way you can use regex and have your diacritics.
Here are options not much work to create replaces.
French Letters [a-zA-ZàâäôéèëêïîçùûüÿæœÀÂÄÔÉÈËÊÏΟÇÙÛÜÆŒ]
German Letters The controversial capital letter for ß, now included in unicode, is missing in many fonts, so it might show on your screen as a question mark. [a-zA-ZäöüßÄÖÜẞ]
Polish Letters [a-pr-uwy-zA-PR-UWY-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ] Note that there is no Q, V and X in Polish. But if you want to allow all English letters as well, use [a-zA-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ]
Italian Letters [a-zA-ZàèéìíîòóùúÀÈÉÌÍÎÒÓÙÚ]
Spanish Letters [a-zA-ZáéíñóúüÁÉÍÑÓÚÜ] from http://www.rexegg.com/regex-interesting-character-classes.html#languages
回答2:
what happened if you use just :
find({name: {$regex: 'pala', $options: "i"}})
you used new Regex()
that may not valid constructor valid constructor is new RegExp()
回答3:
This works for me!
Just replace vowels with a regex which contains accents vowels. Additionally can use $options: 'i'
to avoid case sensitive.
function diacriticSensitiveRegex(string = '') {
return string.replace(/a/g, '[a,á,à,ä]')
.replace(/e/g, '[e,é,ë]')
.replace(/i/g, '[i,í,ï]')
.replace(/o/g, '[o,ó,ö,ò]')
.replace(/u/g, '[u,ü,ú,ù]');
}
find ({ name: { $regex: diacriticSensitiveRegex('pala'), $options: 'i' } });
来源:https://stackoverflow.com/questions/36647244/mongodb-how-to-find-documents-ignoring-case-sensitive-accents-and-percent-like