I\'m trying to write unicode strings to the screen in C++ on Windows. I changed my console font to Lucida Console
and I set the output to CP_UTF8
a
If your file is encoded as UTF-8, you'll find the string length is 12. Run strlen
from
(
) on it to see what I mean. Setting the output code page will print the bytes exactly as you see them.
What the compiler sees is equivalent to the following:
const char text[] = "\xd0\xa0\xd0\xbe\xd1\x81\xd1\x81\xd0\xb8\xd1\x8f";
Wrap it in a wide string (wchar_t
in particular), and things aren't so nice.
Why does C++ handle it differently? I haven't the slightest clue, except perhaps the mechanism used by the code underlying the C++ version is somewhat ignorant (e.g. std::cout
happily outputs whatever you want blindly). Whatever the cause, apparently sticking to C is safest...which is actually unexpected to me considering the fact that Microsoft's own C compiler can't even compile C99 code.
In any case, I'd advise against outputting to the Windows console if possible, Unicode or not. Files are so much more reliable, not to mention less of a hassle.