问题
What I've tried to print unicode is
_setmode(_fileno(stdout), _O_U8TEXT);
string str = u8"unicode 한글 hangul";
cout << str << endl;
I used setmode to show and get unicode correctly, but It crashed with Debug Assertion Fail.
However,
_setmode(_fileno(stdout), _O_U16TEXT);
wstring str = L"unicode 한글 hangul";
wcout << str << endl;
_O_U16TEXT compile and print correctly.
What should I do to use UTF-8? Do I have to find another trick?
回答1:
_setmode mentions _O_U8TEXT
and _O_U16TEXT
(finally), but doesn't go into detail what they do. It does state that these are translation modes.
The documentation for _wsopen lists (emphasis mine):
_O_U16TEXT
Opens a file in Unicode UTF-16 mode._O_U8TEXT
Opens a file in Unicode UTF-8 mode.
What this means is: when using the unicode io facilities (wprintf
, std::wcout
, etc.), which means using unicode (UTF-16) strings, the output will be translated to either UTF-16 or UTF-8 when they're written to the file.
Try this:
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"unicode 한글 hangul\n";
You shouldn't see a difference on a console, but if you redirect the output:
> u8out | hexdump -C
00000000 75 6e 69 63 6f 64 65 20 ed 95 9c ea b8 80 20 68 |unicode ...... h|
00000010 61 6e 67 75 6c 0d 0a |angul..|
00000017
> u16out | hexdump -C
00000000 75 00 6e 00 69 00 63 00 6f 00 64 00 65 00 20 00 |u.n.i.c.o.d.e. .|
00000010 5c d5 00 ae 20 00 68 00 61 00 6e 00 67 00 75 00 |\... .h.a.n.g.u.|
00000020 6c 00 0d 00 0a 00 |l.....|
00000026
In theory this should mean that you can also use _O_U8TEXT
on stdin
to read UTF-8 input, but in practice that doesn't always work:
> u8in < u8.txt
unicode 한글 hangul €µöäüß
> u8in
unicode 한글 hangul €µöäüß
unicode ?? hangul ?攄��
_O_U16TEXT
appears to work with console input (on my machine), but then you can't use UTF-8 encoded redirected input/output:
> u16in
unicode 한글 hangul €µöäüß
unicode 한글 hangul €µöäüß
You can read more about this here: Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?
PS: What the assertion is telling you is that you can't use unicode output with the ANSI output facilities. Curiously, that is not enforced if you don't set one of the unicode modes, though...
来源:https://stackoverflow.com/questions/45232484/c-crash-when-use-setmode-with-o-u8text-to-deal-with-unicode