Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?

前端 未结 8 1218
半阙折子戏
半阙折子戏 2021-02-09 06:23

Please clarify for me, how does UTF16 work? I am a little confused, considering these points:

  • There is a static type in C++, WCHAR, which is 2 bytes long. (alway
8条回答
  •  故里飘歌
    2021-02-09 06:54

    All characters in the Basic Multilingual Plane will be 2 bytes long.

    Characters in other planes will be encoded into 4 bytes each, in the form of a surrogate pair.

    Obviously, if a function does not try to detect surrogate pairs and blindly treats each pair of bytes as a character, it will bug out on strings that contain such pairs.

提交回复
热议问题