问题
I have the following query in MySQL:
SELECT id FROM unicode WHERE `character` = 'a'
The table unicode
contains each unicode character along with an ID (it's integer encoding value). Since the collation of the table is set to utf8_unicode_ci, I would have expected the above query to only return 97 (the letter 'a'). Instead, it returns 119 rows containing the IDs of many 'a'-like letters:
a A Ã ...
It seems to be ignoring both case and the multi-byte nature of the characters.
Any ideas?
回答1:
As documented under Unicode Character Sets:
MySQL implements the
xxx_unicode_ci
collations according to the Unicode Collation Algorithm (UCA) described at http://www.unicode.org/reports/tr10/. The collation uses the version-4.0.0 UCA weight keys: http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt.
The full collation chart makes clear that, in this collation, most variations of a base letter are equivalent irrespective of their lettercase or accent/decoration.
If you want to only match exact letters, you should use a binary collation such as utf8_bin
.
回答2:
The collation of the table is part of the issue; MySQL with a _ci collation is treating all of those 'a's as variants of the same character.
Switching to a _cs collation will force the engine to distinguish 'a' from 'A', and 'á' from 'Á', but it may still treat 'a' and 'á' as the same character.
If you need exact comparison semantics, completely disregarding the equivalency of similar characters, you can use the BINARY comparison operators
SELECT id FROM unicode WHERE BINARY character = 'a'
回答3:
The ci
in the collation means case-insensitive. Switch to a case-sensitive collation (cs
) to get the results you're looking for.
来源:https://stackoverflow.com/questions/12431887/mysql-where-character-a-is-matching-a-a-%c3%83-etc-why