Compiler optimizations allowed via “int”, “least” and “fast” non-fixed width types C/C++

问题

Clearly, fixed-width integral types should be used when the size is important.

However, I read (Insomniac Games style guide), that "int" should be preferred for loop counters / function args / return codes / ect when the size isn't important - the rationale given was that fixed-width types can preclude certain compiler optimizations.

Now, I'd like to make a distinction between "compiler optimization" and "a more suitable typedef for the target architecture". The latter has global scope, and my guess probably has very limited impact unless the compiler can somehow reason about the global performance of the program parameterized by this typedef. The former has local scope, where the compiler would have the freedom to optimize number of bytes used, and operations, based on local register pressure / usage, among other things.

Does the standard permit "compiler optimizations" (as we've defined) for non-fixed-width types? Any good examples of this?

If not, and assuming the CPU can operate on smaller types as least as fast as larger types, then I see no harm, from a performance standpoint, of using fixed-width integers sized according to local context. At least that gives the possibility of relieving register pressure, and I'd argue couldn't be worse.

回答1:

The reason that the rule of thumb is to use an int is that the standard defines this integral type as the natural data type of the CPU (provided that it is sufficiently wide for the range INT_MIN to INT_MAX. That's where the best-performance stems from.

回答2:

There are many things wrong with int_fast types - most notably that they can be slower than int!

#include <stdio.h>
#include <inttypes.h>
int main(void) {
    printf("%zu\n", sizeof (int_fast32_t));
}

Run this on x86-64 and it prints 8... but it makes no sense - using 64-bit registers often require prefixes in x86-64 bit mode, and the "behaviour on overflow is undefined" means that using 32-bit int it doesn't matter if the upper 32 bits of the 64 bit register are set after arithmetic - the behaviour is "still correct".

What is even worse, however, than using the signed fast or least types, is using a small unsigned integer instead of size_t or a signed integer for a loop counter - now the compiler must generate extra code to "ensure the correct wraparound behaviour".

回答3:

I'm not very familiar with the x86 instruction set but unless you can guarantee that practically every arithmetic and move instruction also allows additional shift and (sign) extends then the assumption that smaller types are "as least as fast" as larger ones is not true.

The complexity of x86 makes it pretty hard to come up with simple examples so lets consider an ARM microcontroller instead.

Lets define two addition functions which only differ by return type. "add32" which returns an integer of full register width and "add8" which only returns a single byte.

int32_t add32(int32_t a, int32_t b) { return a + b; }
int8_t add8(int32_t a, int32_t b) { return a + b; }

Compiling those functions with -Os gives the following assembly:

add32(int, int):
        add     r0, r0, r1
        bx      lr
add8(int, int):
        add     r0, r0, r1
        sxtb    r0, r0 // Sign-extend single byte
        bx      lr

Notice how the function which only returns a byte is one instruction longer. It has to truncate the 32bit addition to a single byte.

Here is a link to the code @ compiler explorer: https://godbolt.org/z/ABFQKe

回答4:

However, I read (Insomniac Games style guide), that "int" should be preferred for loop counters

You should rather be using size_t, whenever iterating over an array. int has other problems than performance, such as being signed and also problematic when porting.

From a standard point-of-view, for a scenario where "n" is the size of an int, there exists no case where int_fastn_t should perform worse than int, or the compiler/standard lib/ABI/system has a fault.

Does the standard permit "compiler optimizations" (as we've defined) for non-fixed-width types? Any good examples of this?

Sure, the compiler might optimize the use of integer types quite wildly, as long as it doesn't affect the outcome of the result. No matter if they are int or int32_t.

For example, an 8 bit CPU compiler might optimize int a=1; int b=1; ... c = a + b; to be performed on 8 bit arithmetic, ignoring integer promotions and the actual size of int. It will however most likely have to allocate 16 bits of memory to store the result.

But if we give it some rotten code like char a = 0x80; int b = a >> 1;, it will have to do the optimization so that the side affects of integer promotion are taken in account. That is, the result could be 0xFFC0 rather than 0x40 as one might have expected (assuming signed char, 2's complement, arithmetic shift). The a >> 1 part isn't possible to optimize to an 8 bit type because of this - it has to be carried out with 16 bit arithmetic.

回答5:

I think the question you are trying to ask is:

Is the compiler allowed to make additional optimizations for a non-fixed-width type such as int beyond what it would be allowed for a fixed width type like int32_t that happens to have the same length on the current platform?

That is, you are not interested in the part where the size of the non-fixed width type is allowed to be chosen appropriately for the hardware - you are aware of that and are asking if beyond that additional optimizations are available?

The answer, as far as I am aware or have seen, is no. No both in the sense that compilers do not actually optimize int differently than int32_t (on platforms where int is 32-bits), and also no in the sense that there are not optimizations allowed by the standard for int which are also not allowed for int32_t¹ (this second part is wrong - see comments).

The easiest way to see this is that the various fixed width integers are all typedefs for various underlying primitive integer types - so on a platform with 32-bit integers int32_t will probably be a typedef (perhaps indirectly) of int. So from a behavioral and optimization point of view, the types are identical, and as soon as you are in the IR world of the compiler, the original type probably isn't even really available without jumping through oops (i.e., int and int32_t will generate the same IR).

So I think the advice you received was wrong, or at best misleading.

¹ Of course the answer to the question of "Is it allowed for a compiler to optimize int better than int32_t is yes, since there are not particular requirements on optimization so a compiler could do something weird like that, or the reverse, such as optimizing int32_t better than int. I that's not very interesting though.

来源：https://stackoverflow.com/questions/54825215/compiler-optimizations-allowed-via-int-least-and-fast-non-fixed-width-typ

标签

c++

performance

optimization

x86