The following may not qualify as a SO question; if it is out of bounds, please feel free to tell me to go away. The question here is basically, \"Do I understand the C stand
The problem with wchar_t
is that encoding-agnostic text processing is too difficult and should be avoided. If you stick with "pure C" as you say, you can use all of the w*
functions like wcscat
and friends, but if you want to do anything more sophisticated then you have to dive into the abyss.
Here are some things that much harder with wchar_t
than they are if you just pick one of the UTF encodings:
Parsing Javascript: Identifers can contain certain characters outside the BMP (and lets assume that you care about this kind of correctness).
HTML: How do you turn 𐀀
into a string of wchar_t
?
Text editor: How do you find grapheme cluster boundaries in a wchar_t
string?
If I know the encoding of a string, I can examine the characters directly. If I don't know the encoding, I have to hope that whatever I want to do with a string is implemented by a library function somewhere. So the portability of wchar_t
is somewhat irrelevant as I don't consider it an especially useful data type.
Your program requirements may differ and wchar_t
may work fine for you.