Why C and C++ hate signed char so much?

前端 未结 3 1409
旧巷少年郎
旧巷少年郎 2021-01-11 10:47

Why does C allow accessing object using "character type":

6.5 Expressions (C)

An object shall have its stored value acc

相关标签:
3条回答
  • 2021-01-11 11:39

    I think what you're really asking is why signed char is disqualified from all the rules allowing type-punning to char* as a special case. To be honest, I don't know, especially since — as far as I can tell — signed char cannot have padding either:

    [C++11: 3.9.1/1]: [..] A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. [..]

    Empirical evidence suggests that it's not much more than convention:

    • char is seen as a byte of ASCII;
    • unsigned char is seen as a byte with arbitrary "binary" content; and
    • signed char is left flapping in the wind.

    To me, it doesn't seem like enough of a reason to exclude it from these standard rules, but I honestly can't find any evidence to the contrary. I'm going to put it down to a mildly inexplicable oddity in the standard wording.

    (It may be that we have to ask the std-discussion list about this.)

    0 讨论(0)
  • 2021-01-11 11:49

    The use of a character type to inspect the representations of objects is a hack. However, it is historical, and some accommodation must be made to allow it.

    Mostly, in programming languages, we want strong typing. Something that is a float should be accessed as a float and not as an int. This has a number of benefits, including reducing human errors and enabling various optimizations.

    However, there are times when it is necessary to access or modify the bytes of an object. In C, this was done through character types. C++ continues that tradition, but it improves the situation slightly by eliminating the use of signed char for these purposes.

    Ideally, it might have been better to create a new type, say byte, and to allow byte access to object representations only through this type, thus separating the regular character types only for use as normal integers/characters. Perhaps it was thought there was too much existing code using char and unsigned char to support such a change. However, I have never seen signed char used to access the representation of an object, so it was safe to exclude it.

    0 讨论(0)
  • 2021-01-11 11:50

    Here's my take on the motivation:

    On a non-twos-complement system, signed char will not be suitable for accessing the representation of an object. This is because either there are two possible signed char representations which have the same value (+0 and -0), or one representation that has no value (a trap representation). In either case, this prevents you from doing most meaningful things you might do with the representation of an object. For example, if you have a 16-bit unsigned integer 0x80ff, one or the other byte, as a signed char, is going to either trap or compare equal to 0.

    Note that on such an implementation (non-twos-complement), plain char needs to be defined as an unsigned type for accessing the representations of objects via char to work correctly. While there's no explicit requirement, I see this as a requirement derived from other requirements in the standard.

    0 讨论(0)
提交回复
热议问题