Why is wchar_t
needed? How is it superior to short
(or __int16
or whatever)?
(If it matters: I live in Windows world. I don\'t
It is usually considered a good thing to give things such as data types meaningful names.
What is best, char or int8? I think this:
char name[] = "Bob";
is much easier to understand than this:
int8 name[] = "Bob";
It's the same thing with wchar_t and int16.
The reason there's a wchar_t
is pretty much the same reason there's a size_t
or a time_t
- it's an abstraction that indicates what a type is intended to represent and allows implementations to chose an underlying type that can represent the type properly on a particular platform.
Note that wchar_t
doesn't need to be a 16 bit type - there are platforms where it's a 32-bit type.
As I read the relevant standards, it seems like Microsoft fcked this one up badly.
My manpage for the POSIX <stddef.h>
says that:
- wchar_t: Integer type whose range of values can represent distinct wide-character codes for all mem‐ bers of the largest character set specified among the locales supported by the compilation environment: the null character has the code value 0 and each member of the portable character set has a code value equal to its value when used as the lone character in an integer character constant.
So, 16 bits wchar_t is not enough if your platform supports Unicode. Each wchar_t is supposed to be a distinct value for a character. Therefore, wchar_t goes from being a useful way to work at the character level of texts (after a decoding from the locale multibyte, of course), to being completely useless on Windows platforms.
wchar_t
is the primitive for storing and processing the platform's unicode characters. Its size is not always 16 bit. On unix systems wchar_t
is 32 bit (maybe unix users are more likely to use the klingon charaters that the extra bits are used for :-).
This can pose problems for porting projects especially if you interchange wchar_t
and short, or if you interchange wchar_t
and xerces' XMLCh
.
Therefore having wchar_t
as a different type to short is very important for writing cross-platform code. Cleaning up this was one of the hardest parts of porting our application to unix and then from VC6 to VC2005.
See Wikipedia.
Basically, it's a portable type for "text" in the current locale (with umlauts). It predates Unicode and doesn't solve many problems, so today, it mostly exists for backward compatibility. Don't use it unless you have to.
To add to Aaron's comment - in C++0x we are finally getting real Unicode char types: char16_t and char32_t and also Unicode string literals.