Wrap around explanation for signed and unsigned variables in C?

后端 未结 4 1777
清酒与你
清酒与你 2020-11-27 06:31

I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, altho

相关标签:
4条回答
  • 2020-11-27 07:05

    In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around.

    This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard.

    0 讨论(0)
  • 2020-11-27 07:08

    Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).

    This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.

    Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:

    000  =  0
    001  =  1
    010  =  2
    011  =  3
    100  = -0
    101  = -1
    110  = -2
    111  = -3
    

    Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.

    One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:

    000  =  0
    001  =  1
    010  =  2
    011  =  3
    100  = -3
    101  = -2
    110  = -1 
    111  = -0
    

    I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.

    Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:

    000  =  0
    001  =  1
    010  =  2
    011  =  3
    100  = -4
    101  = -3
    110  = -2
    111  = -1
    

    If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.

    Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.

    0 讨论(0)
  • 2020-11-27 07:16

    The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.

    Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.

    0 讨论(0)
  • 2020-11-27 07:30

    Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.

    For example, GCC compiler can assume that for variable int i the following condition

    if (i > 0 && i + 1 > 0)
    

    is equivalent to a mere

    if (i > 0)
    

    This is exactly what strict overflow semantics means.

    Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.

    However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.

    For example, this code produces a wrap around

    unsigned i = USHRT_MAX;
    i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */
    

    while this code

    unsigned short i = USHRT_MAX;
    i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */
    

    leads to undefined behavior.

    If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.

    0 讨论(0)
提交回复
热议问题