问题
I have a table like this
CREATE TABLE `mb1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And I insert tow rows
insert into mb1(name) values('K'),('K');
Note, the second K is unicode character
+------+-----------+
| name | hex(name) |
+------+-----------+
| K | 4B |
| K | EFBCAB |
+------+-----------+
Why do they cause unique key collision? Aren't they different character in utf8mb4?
After remove COLLATE utf8mb4_unicode_ci, the problem disappeared.
回答1:
Why do they cause unique key collision? Aren't they different character in utf8mb4?
You are missing the point about CHARACTER SET
and COLLATION
. A CHARACTER SET
is a collection of different characters. A COLLATION
says whether to treat the characters as equal -- think A
and a
-- different characters, but treated for ORDER BY
and WHERE =
, etc as being the same.
mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci |
+-----------------------------------+
| 1 |
+-----------------------------------+
So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".
"Equal" is the test for UNIQUE
keys.
Set the COLLATION
for the column to whatever makes sense for you.
- utf8mb4_unicode_ci for good 'real life' comparisons, apparently including this one. K=k=Ķ=ķ
- utf8mb4_unicode_ci for more simple-minded comparisons. In particular no 2-character combinations match 1-character encodings. Case folding and accent stripping does occur. K=k=Ķ=ķ
- utf8mb4_bin blindly checks the bits. No case folding, etc. K k Ķ ķ are all unequal.
utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).
Your K
is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K
.
来源:https://stackoverflow.com/questions/31867895/mysql-utf8mb4-unicode-ci-cause-unique-key-collision