In Unicode what is the difference between composite characters and surrogate pairs?
To me they sound like similar things - two characters to represent one character. Wh
An example of a composite character is Unicode U+0039, É
. It should display identically to the decomposed pair U+0045 E
and U+0301 (the combining acute accent character). This is independent of any byte encoding use to actually store the character; it's just two different ways of representing the same graphical character using Unicode.
A surrogate pair is specific to UTF-16, which uses two 16-bit values to represent a single Unicode code point greater than U+FFFF (which obviously cannot fit in a single 16-bit value). For example (from the Wikipedia article), code point U+1D11E is serialized as the two 16-bit values 0xD834 and 0xDD1E. (The actual byte sequence used to represent them will depend on whether you use the big endian or little endian version of UTF-16.)