While answering another question I got curious about this. I\'m well aware that
if( __builtin_expect( !!a, 0 ) ) {
// not likely
} else {
// quite likely
Did you read the GCC documentation?
Built-in Function: long __builtin_expect (long exp, long c)
You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0)) foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
To explain this a bit... __builtin_expect is specifically useful for communicating which branch you think the program's likely to take. You ask how the compiler can use that insight - well, consider this code:
if (x == 0)
return 10 * y;
else
return 39;
In machine code, the CPU can typically be asked to "goto" another line (which takes time, and depending on the CPU may prevent other execution optimisations - i.e. beneath the level of machine code - for example, see the Branches heading under http://en.wikipedia.org/wiki/Instruction_pipeline), or to call some other code, but there's not really an if/else concept where both true and false code are equal... you have to branch away to find the code for one or the other. The way that's done is basically, in pseudo-code:
test whether x is 0
if it was goto else_return_39
return 10 * y
else_return_39:
return 39
Given most CPUs are slower following the goto
down to the else_return_39:
label than to just fall through to return 10 * y
, code for the "true" branch will be reached faster than for the false branch. Of course, the machine code could test whether x is not 0, put the "false" code (return 39
) first and thereby reverse the performance characteristics.
This is what __builtin_expect controls - you can tell the compiler to put the true or the false branch where less branching is needed to reach it, thereby getting a tiny performance boost.
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?
a) Whether the surrounding function is inlined or not doesn't change the need for branching where the if
statement appears (unless the optimiser sees the condition the if
statement tests is always true
or false
and only one branch could never run). So, it's equally applicable to inlined code.
[ Your comment shows you were interested in conditional expressions - a ? b : c
- I'm not sure - there's a disputed answer to that question at Can I use GCC's __builtin_expect() with ternary operator in C that might prove insightful one way or the other, or the basis for further exploration ]
b) variables - you postulated:
int x = __builtin_expect( t / 10, 7 );
if( x == 7 ) {
That won't work - the compiler's not obliged to associate such expectations with variables and remember them the next time an if
is seen. You can verify this (as I did for gcc 3.4.4) using gcc -S
to produce assembly language output: the assembly doesn't change regardless of the expected value.
c) values other than 0 and 1
It works for integral (long
) values, so yes. The last paragraph of the documentation quoted above address this, specifically:
you should use constructions such as
if (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
Why? Well, if the pointer type is larger than long
, then calling __builtin_conversion(long, long)
would effectively chop off some of the less-significant bits and fail to incorporate the rest in the test. Similarly, floating point values might be larger than a long, and the conversion not produce the result you expect. By using a boolean expression such as ptr != NULL
(given true
converts to 1L and false
to 0) you're sure to get intended results.