Flow of raw bytes of string literal into/out of the Windows (non-wide) execution character set at compile/runtime, & ANSI code pages vs. UTF-8

前端 未结 1 1798
花落未央
花落未央 2021-01-17 04:56

I would like confirmation regarding my understanding of raw string literals and the (non-wide) execution character set on Windows.

Relevant para

相关标签:
1条回答
  • 2021-01-17 05:21

    A very long story, and I have problems finding a single clear question. However, I think I can resolve a number of misunderstandings that led to this.

    First of, "ANSI" is a synonym for the (narrow) execution character set. UTF-16 is the execution wide-character set.

    The compiler will NOT choose for you. If you use narrow char strings, they are ANSI as far as the compiler (runtime) is aware.

    Yes, the particular "ANSI" character encoding can matter. If you compile a L"ä" literal on your PC, and your source code is in CP1252, then that ä character is compiled to a UTF-16 ä. However, the same byte could be another non-ASCII character in other encodigns, which would result in a different UTF-16 character.

    Note however that MSVC is perfectly capable of compiling both UTF-8 and UTF-16 source code, as long as it starts with U+FEFF "BOM". This makes the whole theoretical problem pretty much a non-issue.

    [edit] "Specifically, with MSVC, the execution character set and its encoding depends..."

    No, MSVC has nothing to do with the execution character set, really. The meaning of char(0xE4) is determined by the OS. To see this, check the MinGW compiler. Executables produced by MinGW behave the same as those of MSVC, as both target the same OS.

    0 讨论(0)
提交回复
热议问题