In this case the Clang output is better, because it does not branch; instead it loads the value of num % 2 == 1
to al
the code generated by gcc uses jumps. If num
is expected to be even/odd with 50 % chances, and with no repeating patterns, the code generated by GCC will be susceptible to branch prediction failure.
However you can make the code well-behaved on GCC as well by doing
int foo(int num) {
return num * num + (num % 2 != 1);
}
Even more so, as it seems that your algorithm is really defined for unsigned numbers only, you should use unsigned int
(they're different for negative numbers) - actually you get a major speedup by using unsigned int
for the argument, as now GCC/Clang can optimize num % 2
to num & 1
:
unsigned int foo(unsigned int num) {
return num * num + (num % 2 != 1);
}
The resulting code generated by gcc -O2
movl %edi, %edx
imull %edi, %edi
andl $1, %edx
xorl $1, %edx
leal (%rdi,%rdx), %eax
ret
is much better than the code for your original function generated by either compiler. Thus a compiler does not matter as much as does a programmer who knows what he's doing.