What's the difference between utf8_unicode_ci and utf8mb4_0900_ai_ci

微笑、不失礼 提交于 2021-02-15 11:36:51

问题


What is the difference between utf8mb4_0900_ai_ci and utf8_unicode_ci database text coding in mysql (especially in terms of performance) ?


回答1:


  • The encoding is the same. That is, the bytes look the same.
  • The character set is different. utf8mb4 has more characters.
  • The collation (how comparisions are done) is different.
  • The perfomance is different, but it rarely matters.

utf8_unicode_ci implies the CHARACTER SET utf8, which includes only the 1-, 2-, and 3-byte UTF-8 characters. Hence it excludes most Emoji and some Chinese characters.

utf8mb4_unicode_ci implies the CHARACTER SET utf8mb4 is the corresponding COLLATION for the 4-byte CHARACTER SET utf8mb4.

The Unicode organization has been evolving the specification over the years. Here are the mappings from its "versions" to MySQL Collations:

4.0   _unicode_
5.20  _unicode_520_
9.0   _0900_

Most of the differences will be in areas that most people never encounter. One example: At some point, a change allowed Emoji to be distinguished and ordered in some manner.

The suffix (MySQL doc):

_bin      -- just compare the bits; don't consider case folding, accents, etc
_ci       -- explicitly case insensitive (A=a) and implicitly accent insensitive (a=á)
_ai_ci    -- explicitly case insensitive and accent insensitive
_as (etc) -- accent-sensitive (etc)

Performance:

_bin         -- simple, fast
_general_ci  -- fails to compare multiple letters; eg ss=ß, so somewhat fast
...          -- slower
_900_        -- (8.0) much faster because of a rewrite

However: The speed of collation is usually the least of the performance issues in queries. INDEXes, JOINs, subqueries, table scans, etc are much more critical to performance.



来源:https://stackoverflow.com/questions/54885178/whats-the-difference-between-utf8-unicode-ci-and-utf8mb4-0900-ai-ci

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!