What's the point of UTF-16?

前端未结

关注

 5  812

太阳男子 2021-01-31 01:29

I\'ve never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-3

5条回答

[愿得一人] (楼主)

2021-01-31 01:52

UTF-16 allows all of the basic multilingual plane (BMP) to be represented as single code units. Unicode code points beyond U+FFFF are represented by surrogate pairs.

The interesting thing is that Java and Windows (and other systems that use UTF-16) all operate at the code unit level, not the Unicode code point level. So the string consisting of the single character U+1D122 (MUSICAL SYMBOL F CLEF) gets encoded in Java as "\ud824\udd22" and "\ud824\udd22".length() == 2 (not 1). So it's kind of a hack, but it turns out that characters are not variable length.

The advantage of UTF-16 over UTF-8 is that one would give up too much if the same hack were used with UTF-8.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...