发表新帖

发表新帖

Difference between composite characters and surrogate pairs

前端未结

关注

 2  1017

忘了有多久 2021-02-06 12:08

In Unicode what is the difference between composite characters and surrogate pairs?

To me they sound like similar things - two characters to represent one character. Wh

2条回答

独厮守ぢ (楼主)

2021-02-06 12:39

An example of a composite character is Unicode U+0039, É. It should display identically to the decomposed pair U+0045 E and U+0301 (the combining acute accent character). This is independent of any byte encoding use to actually store the character; it's just two different ways of representing the same graphical character using Unicode.

A surrogate pair is specific to UTF-16, which uses two 16-bit values to represent a single Unicode code point greater than U+FFFF (which obviously cannot fit in a single 16-bit value). For example (from the Wikipedia article), code point U+1D11E is serialized as the two 16-bit values 0xD834 and 0xDD1E. (The actual byte sequence used to represent them will depend on whether you use the big endian or little endian version of UTF-16.)

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题