Difference between composite characters and surrogate pairs

前端 未结 2 1006
忘了有多久
忘了有多久 2021-02-06 12:08

In Unicode what is the difference between composite characters and surrogate pairs?

To me they sound like similar things - two characters to represent one character. Wh

2条回答
  •  独厮守ぢ
    2021-02-06 12:39

    An example of a composite character is Unicode U+0039, É. It should display identically to the decomposed pair U+0045 E and U+0301 (the combining acute accent character). This is independent of any byte encoding use to actually store the character; it's just two different ways of representing the same graphical character using Unicode.

    A surrogate pair is specific to UTF-16, which uses two 16-bit values to represent a single Unicode code point greater than U+FFFF (which obviously cannot fit in a single 16-bit value). For example (from the Wikipedia article), code point U+1D11E is serialized as the two 16-bit values 0xD834 and 0xDD1E. (The actual byte sequence used to represent them will depend on whether you use the big endian or little endian version of UTF-16.)

提交回复
热议问题