The following may not qualify as a SO question; if it is out of bounds, please feel free to tell me to go away. The question here is basically, \"Do I understand the C stand
Is this the right way to write an idiomatic, portable, universal, encoding-agnostic program core using only pure standard C/C++
No, and there is no way at all to fulfill all these properties, at least if you want your program to run on Windows. On Windows, you have to ignore the C and C++ standards almost everywhere and work exclusively with wchar_t
(not necessarily internally, but at all interfaces to the system). For example, if you start with
int main(int argc, char** argv)
you have already lost Unicode support for command line arguments. You have to write
int wmain(int argc, wchar_t** argv)
instead, or use the GetCommandLineW
function, none of which is specified in the C standard.
More specifically,
#ifdef
s.wchar_t
is a UTF-16 code unit on Windows and that char
is often (bot not always) a UTF-8 code unit on Linux. Encoding-awareness is often the more desirable goal: make sure that you always know with which encoding you work, or use a wrapper library that abstracts them away.I think I have to conclude that it's completely impossible to build a portable Unicode-capable application in C or C++ unless you are willing to use additional libraries and system-specific extensions, and to put lots of effort in it. Unfortunately, most applications already fail at comparatively simple tasks such as "writing Greek characters to the console" or "supporting any filename allowed by the system in a correct manner", and such tasks are only the first tiny steps towards true Unicode support.