Endianness — why do chars put in an Int16 print backwards?

后端 未结 4 967
一向
一向 2021-01-25 05:17

The following C code, compiled and run in XCode:

UInt16 chars = \'ab\';
printf(\"\\nchars: %2.2s\", (char*)&chars);

prints \'ba\', rather t

相关标签:
4条回答
  • 2021-01-25 05:21

    It depends on the system you're compiling/running your program on.

    Obviously on your system, the short value is stored in memory as 0x6261 (ba): the little endian way.

    When you ask to decode a string, printf will read byte by byte the value you have stored in memory, which actually is 'b', then 'a'. Thus your result.

    0 讨论(0)
  • 2021-01-25 05:27

    That particular implementation seems to store multi-character constants in little-endian format. In the constant 'ab' the character 'b' is the least significant byte (the little end) and the character 'a' is the most significant byte. If you viewed chars as an array, it'd be chars[0] = 'b' and chars[1] = 'a', and thus would be treated by printf as "ba".

    Also, I'm not sure how accurate you consider Wikipedia, but regarding C syntax it has this section:

    Multi-character constants (e.g. 'xy') are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into one int is not specified, portable use of multi-character constants is difficult.

    So it appears the 'ab' multi-character constant format should be avoided in general.

    0 讨论(0)
  • 2021-01-25 05:32

    Multicharacter character literals are implementation-defined:

    C99 6.4.4.4p10: "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."

    gcc and icl print ba on Windows 7. tcc prints a and drops the second letter altogether...

    0 讨论(0)
  • 2021-01-25 05:34

    The answer to your question can be found in your tags: Endianness. On a little endian machine the least significant byte is stored first. This is a convention and does not affect efficiency at all.

    Of course, this means that you cannot simply cast it to a character string, since the order of characters is wrong, because there are no significant bytes in a character string, but just a sequence.

    If you want to view the bytes within your variable, I suggest using a debugger that can read the actual bytes.

    0 讨论(0)
提交回复
热议问题