问题
6.4.3 Universal character names
A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (`), nor one in the range D800 through DFFF inclusive.
Besides the fact that it is no longer "universal" with restrictions like this, I can't think of good reasons for such a restriction. Anyone knows the backstory?
回答1:
D800
through DFFF
inclusive are not valid code points; they are high and low surrogates, which can only be found in pairs in UTF-16 encoding in order to represent code points outside of the base plane.
The other restriction avoids having a universal character name collide with a character which could be represented in the C character set, for the benefit of compilers which don't bother resolving universal character names into their unicode equivalents. So the compiler is under no obligation to recognize a + written as \u002B
or to know that a
and \u0061
represent the same name. ($, @ and ` are not valid in a C program outside of comments and character strings, so they do not require any special attention from the lexer.)
The range of code points less than A0
also includes control characters and whitespace. (C does not consider \u00A0
to be whitespace.)
来源:https://stackoverflow.com/questions/20158472/why-c99-has-such-an-odd-restriction-for-universal-character-names