So I\'ve finally gotten back to my main task - porting a rather large C++ project from Windows to the Mac.
Straight away I\'ve been hit by the problem where wchar_t
Always use a protocol defined to the byte when a file or network connection is involved. Do not rely on how a C++ compiler stores anything in memory. For Unicode text, this means choosing both an encoding and a byte order (okay, UTF-8 doesn't care about byte order). Even if the platforms you currently want to support have similar architectures, another popular platform with different behavior or even a new OS for one of your existing platforms will likely come along, and you'll be glad you wrote portable code.
As a rule of thumb: UTF-16 for processing, UTF-8 for communication & storage.
Sure, any rule can be broken and this one is not carved in stone. But you have to know when it is ok to break it.
For instance it might be a good idea to use something else if the environment you are using wants something else. But Mac OS X APIs use UTF-16, same as Windows. So UTF-16 makes more sense. It is more straightforward to convert before you put/get things on the net (because you probably do it in 2-3 routines) than doing all the conversions to call OS APIs.
It also matter the type of application you develop. If it is something with very little text processing, and very little calls to the system (something like an email server that mostly moves things around without changing them), then UTF-8 might be a good choice.
So, as much as you might hate this answer, "it depends".
ICU has a C++ string class, UnicodeString
I tend to use UTF-8 as the internal representation. You only lose string length checking, with isn't really useful anyways. For Windows API conversion, I use my own Win32 conversion functions I devised here. As Mac and linux are (for the most part standard UTF-8-aware, no need to convert anything there). Free bonuses you get:
std::string
.