MySQL WHERE `character` = 'a' is matching a, A, Ã, etc. Why?

落爺英雄遲暮 提交于 2019-12-12 09:10:41

问题


I have the following query in MySQL:

SELECT id FROM unicode WHERE `character` = 'a'

The table unicode contains each unicode character along with an ID (it's integer encoding value). Since the collation of the table is set to utf8_unicode_ci, I would have expected the above query to only return 97 (the letter 'a'). Instead, it returns 119 rows containing the IDs of many 'a'-like letters:

a A Ã ...

It seems to be ignoring both case and the multi-byte nature of the characters.

Any ideas?


回答1:


As documented under Unicode Character Sets:

MySQL implements the xxx_unicode_ci collations according to the Unicode Collation Algorithm (UCA) described at http://www.unicode.org/reports/tr10/. The collation uses the version-4.0.0 UCA weight keys: http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt.

The full collation chart makes clear that, in this collation, most variations of a base letter are equivalent irrespective of their lettercase or accent/decoration.

If you want to only match exact letters, you should use a binary collation such as utf8_bin.




回答2:


The collation of the table is part of the issue; MySQL with a _ci collation is treating all of those 'a's as variants of the same character.

Switching to a _cs collation will force the engine to distinguish 'a' from 'A', and 'á' from 'Á', but it may still treat 'a' and 'á' as the same character.

If you need exact comparison semantics, completely disregarding the equivalency of similar characters, you can use the BINARY comparison operators

SELECT id FROM unicode WHERE BINARY character = 'a'



回答3:


The ci in the collation means case-insensitive. Switch to a case-sensitive collation (cs) to get the results you're looking for.



来源:https://stackoverflow.com/questions/12431887/mysql-where-character-a-is-matching-a-a-%c3%83-etc-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!