The C and C++ standards stipulate that, in binary operations between a signed and an unsigned integer of the same rank, the signed integer is cast to unsigned. There are man
Casting from unsigned to signed results in implementation-defined behaviour if the value cannot be represented. Casting from signed to unsigned is always modulo two to the power of the unsigned's bitsize, so it is always well-defined.
The standard conversion is to the signed type if every possible unsigned value is representable in the signed type. Otherwise, the unsigned type is chosen. This guarantees that the conversion is always well-defined.
As indicated in comments, the conversion algorithm for C++ was inherited from C to maintain compatibility, which is technically the reason it is so in C++.
It has been suggested that the decision in the standard to define signed to unsigned conversions and not unsigned to signed conversion is somehow arbitrary, and that the other possible decision would be symmetric. However, the possible conversion are not symmetric.
In both of the non-2's-complement representations contemplated by the standard, an n-bit signed representation can represent only 2n−1 values, whereas an n-bit unsigned representation can represent 2n values. Consequently, a signed-to-unsigned conversion is lossless and can be reversed (although one unsigned value can never be produced). The unsigned-to-signed conversion, on the other hand, must collapse two different unsigned values onto the same signed result.
In a comment, the formula sint = uint > sint_max ? uint - uint_max : uint
is proposed. This coalesces the values uint_max
and 0; both are both mapped to 0. That's a little weird even for non-2s-complement representations, but for 2's-complement it's unnecessary and, worse, it requires the compiler to emit code to laboriously compute this unnecessary conflation. By contrast the standard's signed-to-unsigned conversion is lossless and in the common case (2's-complement architectures) it is a no-op.
This is sort of a half-answer, because I don't really understand the committee's reasoning.
From the C90 committee's rationale document: https://www.lysator.liu.se/c/rat/c2.html#3-2-1-1
Since the publication of K&R, a serious divergence has occurred among implementations of C in the evolution of integral promotion rules. Implementations fall into two major camps, which may be characterized as unsigned preserving and value preserving. The difference between these approaches centers on the treatment of
unsigned char
andunsigned short
, when widened by the integral promotions, but the decision has an impact on the typing of constants as well (see §3.1.3.2).
... and apparently also on the conversions done to match the two operands for any operator. It continues:
Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with twos-complement arithmetic and quiet wraparound on signed overflow --- that is, in most current implementations.
It then specifies a case where ambiguity of interpretation arises, and states:
The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an
unsigned int
confronts asigned int
across an operator, and thesigned int
has a negative value. (Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation.) Suddenly, the negativesigned int
becomes a very largeunsigned int
, which may be surprising --- or it may be exactly what is desired by a knowledgable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
and:
The unsigned preserving rules greatly increase the number of situations where
unsigned int
confrontssigned int
to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
Thus, they consider the case of int + unsigned
an unwanted situation, and chose conversion rules for char
and short
that yield as few of those situations as possible, even though most compilers at the time followed a different approach. If I understand right, this choice then forced them to follow the current choice of int + unsigned
yielding an unsigned
operation.
I still find all of this truly bizarre.
If the signed casting was chosen, then simple a+1
would always result in singed type (unless constant was typed as 1U
).
Assume a
was unsigned int
, then this seemingly innocent increment a+1
could lead to things like undefined overflow or "index out of bound", in the case of arr[a+1]
Thus, "unsigned casting" seems like a safer approach because people probably don't even expect casting to be happening in the first place, when simply adding a constant.