MySQL REGEXP query - accent insensitive search

前端未结

关注

 7  1896

I\'m looking to query a database of wine names, many of which contain accents (but not in a uniform way, and so similar wines may be entered with or without accents)

相关标签:

7条回答

甜味超标

2020-12-19 00:58

I had the same problem trying to find every record matching one of the following patterns: 'copropriété', 'copropriete', 'COPROPRIÉTÉ', 'Copropri?t?'

REGEXP 'copropri.{1,2}t.{1,2} worked for me. Basically, .{1,2} will should work in every case wether the character is 1 or 2 byte encoded.

Explanation: https://dev.mysql.com/doc/refman/5.7/en/regexp.html

Warning
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-12-19 01:00
Because REGEXP and RLIKE are byte oriented, have you tried:
```
SELECT 'Faugères' REGEXP 'Faug(e|è|ê|é|ë)r(e|è|ê|é|ë)s';
```
This says one of these has to be in the expression. Notice that I haven't used the plus(+) because that means ONE OR MORE. Since you only want one you should not use the plus.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-19 01:02
To solve this problem, I tried different things, including using the binary keyword or the latin1 character set but to no avail.
Finally, considering that it is a MySql bug, I ended up replacing the é and è chars,

Like this :
```
SELECT * 
FROM `table` 
WHERE replace(replace(wine_name, 'é', 'e'), 'è', 'e') REGEXP '[[:<:]]Faugeres[[:>:]]'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-19 01:04
Ok I just stumbled on this question while searching for something else.

This returns true.
```
SELECT 'Faugères' REGEXP 'Faug[eèêéë]+r[eèêéë]+s';
```
Hope it helps.

Adding the '+' Tells the regexp to look for one or more occurrences of the characters.
0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2020-12-19 01:05
I have this problem, and went for Álvaro's suggestion above. But in my case, it misses those instances where the search term is the middle word in the string. I went for the equivalent of:
```
SELECT *
FROM `table`
WHERE wine_name = 'Faugères'
   OR wine_name LIKE 'Faugères %'
   OR wine_name LIKE '% Faugères'
   OR wine_name LIKE '% Faugères %'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-12-19 01:18

utf8_general_ci see no difference between accent/no accent when sorting. Maybe this true for searches as well. Also, change REGEXP to LIKE. REGEXP makes binary comparison.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页