From the answers I got from this question, it appears that C++ inherited this requirement for conversion of short
into int
when performing arithmet
The linked question seems to cover it pretty well: the CPU just doesn't. A 32-bit CPU has its native arithmetic operations set up for 32-bit registers. The processor prefers to work in its favorite size, and for operations like this, copying a small value into a native-size register is cheap. (For the x86 architecture, the 32-bit registers are named as if they are extended versions of the 16-bit registers (eax
to ax
, ebx
to bx
, etc); see x86 integer instructions).
For some extremely common operations, particularly vector/float arithmetic, there may be specialized instructions that operate on a different register type or size. For something like a short, padding with (up to) 16 bits of zeroes has very little performance cost and adding specialized instructions is probably not worth the time or space on the die (if you want to get really physical about why; I'm not sure they would take actual space, but it does get way more complex).
short
and char
types are considered by the standard sort of "storage types" i.e. sub-ranges that you can use to save some space but that are not going to buy you any speed because their size is "unnatural" for the CPU.
On certain CPUs this is not true but good compilers are smart enough to notice that if you e.g. add a constant to an unsigned char and store the result back in an unsigned char then there's no need to go through the unsigned char -> int
conversion.
For example with g++ the code generated for the inner loop of
void incbuf(unsigned char *buf, int size) {
for (int i=0; i<size; i++) {
buf[i] = buf[i] + 1;
}
}
is just
.L3:
addb $1, (%rdi,%rax)
addq $1, %rax
cmpl %eax, %esi
jg .L3
.L1:
where you can see that an unsigned char addition instruction (addb
) is used.
The same happens if you're doing your computations between short ints and storing the result in short ints.
If we look at the Rationale for International Standard—Programming Languages—C in section 6.3.1.8
Usual arithmetic conversions it says (emphasis mine going forward):
The rules in the Standard for these conversions are slight modifications of those in K&R: the modifications accommodate the added types and the value preserving rules. Explicit license was added to perform calculations in a “wider” type than absolutely necessary, since this can sometimes produce smaller and faster code, not to mention the correct answer more often. Calculations can also be performed in a “narrower” type by the as if rule so long as the same end result is obtained. Explicit casting can always be used to obtain a value in a desired type
Section 6.3.1.8 from the draft C99 standard covers the Usual arithmetic conversions which is applied to operands of arithmetic expressions for example section 6.5.6 Additive operators says:
If both operands have arithmetic type, the usual arithmetic conversions are performed on them.
We find similar text in section 6.5.5 Multiplicative operators as well. In the case of a short operand, first the integer promotions are applied from section 6.3.1.1 Boolean, characters, and integers which says:
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.
The discussion from section 6.3.1.1
of the Rationale or International Standard—Programming Languages—C on integer promotions is actually more interesting, I am going to selectively quote b/c it is too long to fully quote:
Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.
[...]
The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.
The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int. Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int.
This can have some rather unexpected results in some cases as Inconsistent behaviour of implicit conversion between unsigned and bigger signed types demonstrates, there are plenty more examples like that. Although in most cases this results in the operations working as expected.
It's not a feature of the language as much as it is a limitation of physical processor architectures on which the code runs. The int
typer in C is usually the size of your standard CPU register. More silicon takes up more space and more power, so in many cases arithmetic can only be done on the "natural size" data types. This is not universally true, but most architectures still have this limitation. In other words, when adding two 8-bit numbers, what actually goes on in the processor is some type of 32-bit arithmetic followed by either a simple bit mask or another appropriate type conversion.