MySQL does not treat ı as i?

前端 未结 1 363
轻奢々
轻奢々 2021-01-28 02:43

I have a user table in MySQL 5.7.27 with utf8mb4_unicode_ci collation.

Unfortunately, ı is not threaded as i for example, the below query won\'t find

相关标签:
1条回答
  • 2021-01-28 03:33

    Referring to http://mysql.rjweb.org/utf8_collations.html , I see that ı=i in 3 collations: utf8_general_ci, utf8_general_mysql500_ci, utf8_turkish_ci. However, for the turkish collation, I=ı sorts before other accented I's. In all other collations ı sorts after all I's, as if it is treated as a separate letter.

    Meanwhile İ=I in all collations except utf8_turkish_ci.

    The plot thickens with MySQL 8.0. utf8mb4_tr_0900_ai_ci (only) has this ordering:

    I=Ì=Í=Î=Ï=Ĩ=Ī=Ĭ=Į=ı sort before  i=ì=í=î=ï=ĩ=ī=ĭ=į=İ
    

    Meanwhile ä=Ä and they match most other accented A's for most collations (including the Turkish ones).

    Bottom line: It seems that utf8[mb4]_general_ci is the only collation in 5.7 or 8.0 that will always treat a dotless-i (or dotted-I) equal to a 'regular i/I and at the same time ignore umlauts.

    Caveat: The "general" collations do not test more than one character at a time. That is, a "non-spacing umlaut" plus a vowel will not be treated as equal to the combination.

    In that link... The one character æ is sorted the same as the two letters ae for some collations. That's indicated by: Aa ae=æ az. In about half of the other collations, the character æ is treated as a separate letter; this is indicated by it being after az and before b. Or even after zz for Scandinavian collations. This separate letter concept sometimes applies to letter pairs, for example cs (Hungarian) and ch (traditional Spanish).

    0 讨论(0)
提交回复
热议问题