I see that Visual Studio 2008 and later now start off a new solution with the Character Set set to Unicode. My old C++ code deals with only English ASCII text and is full of:
Your question involves two different but related concepts. One of them is the encoding of the string (Unicode/ASCII, for example). The other is the data type to be used for the character representation.
Technically, you can have an Unicode application using plain char
and std::string. You could use literals in hexadecimal ("\x5FA") or octal ("\05FA") format to specify the byte sequence of the string. Notice that with this approach your already existent string literals that contain ASCII characters should remain valid, since Unicode preserves the codes from ASCII.
One important point to observe is that many string related functions would need to be used carefully. This is because they'll be operating on bytes rather than characters. For example, std::string::operator[]
might give you a particular byte that is only part of an Unicode character.
In Visual Studio wchar_t
was chosen as the underlying character type. So if you're in working with Microsoft based libraries things should get easier for you if you follow many of the advices posted by others here. Replacing char
for wchar_t
, using the "T" macros (if you want to preserve transparency between Unicode/non-Unicode), etc.
However, I don't think there is a de facto standard of working with Unicode across libraries, since they might have different strategies to handle it.