Consider following example:
#include
int main(void)
{
unsigned char a = 15; /* one byte */
unsigned short b = 15; /* two bytes */
This is due to how the integer promotions applied to the operand and the requirement that the result of unary minus have the same type. This is covered in section 6.5.3.3
Unary arithmetic operators and says (emphasis mine going forward):
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
and integer promotion which is covered in the draft c99 standard section 6.3
Conversions and says:
if an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.
In the first two cases, the promotion will be to int and the result will be int. In the case of unsigned int no promotion is required but the result will require a conversion back to unsigned int.
The -15
is converted to unsigned int using the rules set out in section 6.3.1.3
Signed and unsigned integers which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)
So we end up with -15 + (UMAX + 1)
which results in UMAX - 14
which results in a large unsigned value. This is sometimes why you will see code use -1
converted to to an unsigned value to obtain the max unsigned value of a type since it will always end up being -1 + UMAX + 1
which is UMAX
.
int
is special. Everything smaller than int
gets promoted to int
in arithmetic operations.
Thus -a
and -b
are applications of unary minus to int
values of 15, which just work and produce -15. This value is then converted to long
.
-c
is different. c
is not promoted to an int
as it is not smaller than int
. The result of unary minus applied to an unsigned int
value of k
is again an unsigned int
, computed as 2N-k (N is the number of bits).
Now this unsigned int
value is converted to long
normally.
C's integer promotion rules are what they are because standards-writers wanted to allow a wide variety of existing implementations that did different things, in some cases because they were created before there were "standards", to keep on doing what they were doing, while defining rules for new implementations that were more specific than "do whatever you feel like". Unfortunately, the rules as written make it extremely difficult to write code which doesn't depend upon a compiler's integer size. Even if future processors would be able to perform 64-bit operations faster than 32-bit ones, the rules dictated by the standards would cause a lot of code to break if int
ever grew beyond 32 bits.
It would probably in retrospect have been better to have handled "weird" compilers by explicitly recognizing the existence of multiple dialects of C, and recommending that compilers implement a dialect that handles various things in consistent ways, but providing that they may also implement dialects which do them differently. Such an approach may end up ultimately being the only way that int
can grow beyond 32 bits, but I've not heard of anyone even considering such a thing.
I think the root of the problem with unsigned integer types stems from the fact that they are sometimes used to represent numerical quantities, and are sometimes used to represent members of a wrapping abstract algebraic ring. Unsigned types behave in a manner consistent with an abstract algebraic ring in circumstances which do not involve type promotion. Applying a unary minus to a member of a ring should (and does) yield a member of that same ring which, when added to the original, will yield zero [i.e. the additive inverse]. There is exactly one way to map integer quantities to ring elements, but multiple ways exist to map ring elements back to integer quantities. Thus, adding a ring element to an integer quantity should yield an element of the same ring regardless of the size of the integer, and conversion from rings to integer quantities should require that code specify how the conversion should be performed. Unfortunately, C implicitly converts rings to integers in cases where either the size of the ring is smaller than the default integer type, or when an operation uses a ring member with an integer of a larger type.
The proper solution to solve this problem would be to allow code to specify that certain variables, return values, etc. should be regarded as ring types rather than numbers; an expression like -(ring16_t)2
should yield 65534 regardless of the size of int
, rather than yielding 65534 on systems where int
is 16 bits, and -2 on systems where it's larger. Likewise, (ring32)0xC0000001 * (ring32)0xC0000001
should yield (ring32)0x80000001
even if int
happens to be 64 bits [note that if int
is 64 bits, the compiler could legally do anything it likes if code tries to multiply two unsigned 32-bit values which equal 0xC0000001, since the result would be too large to represent in a 64-bit signed integer.
Negatives are tricky. Especially when it comes to unsigned values. If you look at the c-documentation, you'll notice that (contrary to what you'd expect) unsigned chars and shorts are promoted to signed ints for computing, while an unsigned int will be computed as an unsigned int.
When you compute the -c, the c is treated as an int, it becomes -15, then is stored in x, (which still believes it is an UNSIGNED int) and is stored as such.
For clarification - No ACTUAL promotion is done when "negativeing" an unsigned. When you assign a negative to any type of int (or take a negative) the 2's compliment of the number is instead used. Since the only practical difference between unsigned and signed values is that the MSB acts as a sign flag, it is taken as a very large positive number instead of a negative one.
This behavior is correct. Quotes are from C 9899:TC2.
6.5.3.3/3:
The result of the unary
-
operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
6.2.5/9:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
6.3.1.1/2:
The following may be used in an expression wherever an
int
orunsigned int
may be used:
An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of
int
andunsigned int
.A bit-field of type
_Bool
,int
,signed int
, orunsigned int
.If an
int
can represent all values of the original type, the value is converted to anint
; otherwise, it is converted to anunsigned int
. These are called the integer promotions. All other types are unchanged by the integer promotions.
So for long x = -a;
, since the operand a
, an unsigned char
, has conversion rank less than the rank of int
and unsigned int
, and all unsigned char
values can be represented as int
(on your platform), we first promote to type int
. The negative of that is simple: the int
with value -15
.
Same logic for unsigned short
(on your platform).
The unsigned int c
is not changed by promotion. So the value of -c
is calculated using modular arithmetic, giving the result UINT_MAX-14
.