I have a user table in MySQL 5.7.27 with utf8mb4_unicode_ci
collation.
Unfortunately, ı is not threaded as i for example, the below query won\'t find
Referring to http://mysql.rjweb.org/utf8_collations.html , I see that ı=i
in 3 collations: utf8_general_ci, utf8_general_mysql500_ci, utf8_turkish_ci. However, for the turkish collation, I=ı
sorts before other accented I's. In all other collations ı
sorts after all I's, as if it is treated as a separate letter.
Meanwhile İ=I
in all collations except utf8_turkish_ci.
The plot thickens with MySQL 8.0. utf8mb4_tr_0900_ai_ci (only) has this ordering:
I=Ì=Í=Î=Ï=Ĩ=Ī=Ĭ=Į=ı sort before i=ì=í=î=ï=ĩ=ī=ĭ=į=İ
Meanwhile ä=Ä
and they match most other accented A's for most collations (including the Turkish ones).
Bottom line: It seems that utf8[mb4]_general_ci is the only collation in 5.7 or 8.0 that will always treat a dotless-i (or dotted-I) equal to a 'regular i/I and at the same time ignore umlauts.
Caveat: The "general" collations do not test more than one character at a time. That is, a "non-spacing umlaut" plus a vowel will not be treated as equal to the combination.
In that link... The one character æ
is sorted the same as the two letters ae
for some collations. That's indicated by: Aa ae=æ az
. In about half of the other collations, the character æ
is treated as a separate letter; this is indicated by it being after az
and before b
. Or even after zz
for Scandinavian collations. This separate letter concept sometimes applies to letter pairs, for example cs
(Hungarian) and ch
(traditional Spanish).