Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

前端 未结 5 1725
无人共我
无人共我 2020-11-22 01:48

Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard (§6.2.5/9) states

A computatio

相关标签:
5条回答
  • 2020-11-22 02:22

    In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X and Y, there will exist some other value Z such that X+Z will, if properly cast, equal Y and Y-Z will, if properly cast, equal X). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.

    This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.

    0 讨论(0)
  • 2020-11-22 02:24

    The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.

    In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).

    Relevant quotes:

    C99 6.2.6.1:3:

    Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.

    C99 6.2.6.2:2:

    If the sign bit is one, the value shall be modified in one of the following ways:

    — the corresponding value with sign bit 0 is negated (sign and magnitude);

    — the sign bit has the value −(2N) (two’s complement);

    — the sign bit has the value −(2N − 1) (one’s complement).


    Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.

    0 讨论(0)
  • 2020-11-22 02:29

    First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!

    The relevant text that states that overflow of integers and floats is undefined behavior is this:

    C11 6.5/5

    If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

    A clarification regarding the behavior of unsigned integer types specifically can be found here:

    C11 6.2.5/9

    The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

    This makes unsigned integer types a special case.

    Also note that there is an exception if any type is converted to a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.

    C11 6.3.1.3

    6.3.1.3 Signed and unsigned integers

    When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

    Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

    Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

    0 讨论(0)
  • Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.

    The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).

    Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.

    0 讨论(0)
  • 2020-11-22 02:34

    Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).

    It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).

    0 讨论(0)
提交回复
热议问题