fatal error: high- and low-surrogate code points are not valid Unicode scalar values [duplicate]

五迷三道 提交于 2019-12-22 07:06:48

问题


Sometimes while initializing a UnicodeScalar with a value like 57292 yields the following error:

fatal error: high- and low-surrogate code points are not valid Unicode scalar values

What is this error, why does it occur and how can I prevent it in the future?


回答1:


Background: UTF-16 represents a sequence of Unicode characters ("code points") as a sequence of 16-bit "code units". For characters whose scalar values fit within 16 bits (i.e., those from U+0000 to U+FFFF), the code unit has the same value as the character; but for characters outside that range (those from U+10000 to U+10FFFF), UTF-16 has to use two code units. To make this work, Unicode reserves a range of code-points (U+D800 to U+DFFF) as "surrogates", which cannot be used as characters; UTF-16 can then use two of these surrogates together to represent a code point outside the 16-bit range. (The "high" and "low" refer to surrogates that serve as the first and second code units in these pairs, respectively. Each surrogate is either a high surrogate or a low surrogate, but not both; experience with older character sets had shown that it's very useful to always be able to tell where one character ends and the next begins.)

So the issue you're seeing is that you're trying to create a UnicodeScalar with a value (U+DFCC) that, according to the Unicode standard, is reserved to not be a Unicode scalar. U+DFCC is defined not to exist, and is just a "surrogate" for half of a scalar that does exist.

To prevent this issue, you need to stick to scalars that do exist — U+0000 to U+D7FF and U+E000 to U+10FFFF.



来源:https://stackoverflow.com/questions/32158381/fatal-error-high-and-low-surrogate-code-points-are-not-valid-unicode-scalar-va

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!