From c++2003 2.13
A wide string literal has type “array of n const wchar_t” and has static storage duration, whe
The standard requires that wchar_t
be large enough to hold any character in the supported character set. Based on this, I think your premise is correct -- it is wrong for VC++ to represent the single character \U000E0005
using two wchar_t
units.
Characters outside the BMP are rarely used, and Windows itself internally uses UTF-16 encoding, so it is simply convenient (even if incorrect) for VC++ to behave this way. However, rather than "banning" such characters, it is likely that the size of wchar_t
will increase in the future while char16_t
takes its place in the Windows API.
The answer you linked to is somewhat misleading as well:
On Linux, a
wchar_t
is 4-bytes, while on Windows, it's 2-bytes
The size of wchar_t
depends solely on the compiler and has nothing to do with the operating system. It just happens that VC++ uses 2 bytes for wchar_t
, but once again, this could very well change in the future.
Windows knows nothing about wchar_t, because wchar_t is a programming concept. Conversely, wchar_t is just storage, and it knows nothing about the semantic value of the data you store in it (that is, it knows nothing about Unicode or ASCII or whatever.)
If a compiler or SDK that targets Windows defines wchar_t to be 16 bits, then that compiler may be in conflict with the C++0x standard. (I don't know whether there are some get-out clauses that allow wchar_t to be 16 bits.) But in any case the compiler could define wchar_t to be 32 bits (to comply with the standard) and provide runtime functions to convert to/from UTF-16 for when you need to pass your wchar_t* to Windows APIs.