What's the point of UTF-16?

前端 未结 5 812
太阳男子
太阳男子 2021-01-31 01:29

I\'ve never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-3

5条回答
  •  [愿得一人]
    2021-01-31 01:52

    UTF-16 allows all of the basic multilingual plane (BMP) to be represented as single code units. Unicode code points beyond U+FFFF are represented by surrogate pairs.

    The interesting thing is that Java and Windows (and other systems that use UTF-16) all operate at the code unit level, not the Unicode code point level. So the string consisting of the single character U+1D122 (MUSICAL SYMBOL F CLEF) gets encoded in Java as "\ud824\udd22" and "\ud824\udd22".length() == 2 (not 1). So it's kind of a hack, but it turns out that characters are not variable length.

    The advantage of UTF-16 over UTF-8 is that one would give up too much if the same hack were used with UTF-8.

提交回复
热议问题