Why C99 has such an odd restriction for universal character names?

问题

6.4.3 Universal character names

A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (`), nor one in the range D800 through DFFF inclusive.

Besides the fact that it is no longer "universal" with restrictions like this, I can't think of good reasons for such a restriction. Anyone knows the backstory?

回答1:

D800 through DFFF inclusive are not valid code points; they are high and low surrogates, which can only be found in pairs in UTF-16 encoding in order to represent code points outside of the base plane.

The other restriction avoids having a universal character name collide with a character which could be represented in the C character set, for the benefit of compilers which don't bother resolving universal character names into their unicode equivalents. So the compiler is under no obligation to recognize a + written as \u002B or to know that a and \u0061 represent the same name. ($, @ and ` are not valid in a C program outside of comments and character strings, so they do not require any special attention from the lexer.)

The range of code points less than A0 also includes control characters and whitespace. (C does not consider \u00A0 to be whitespace.)

来源：https://stackoverflow.com/questions/20158472/why-c99-has-such-an-odd-restriction-for-universal-character-names

标签

unicode

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!