For the Intel architectures, is there a way to instruct the GCC compiler to generate code that always forces branch prediction a particular way in my code? Does the Intel h
As of C++20 the likely and unlikely attributes should be standardized and are already supported in g++9. So as discussed here, you can write
if (a>b) {
/* code you expect to run often */
[[likely]] /* last statement */
}
e.g. in the following code the else block gets inlined thanks to the [[unlikely]]
in the if block
int oftendone( int a, int b );
int rarelydone( int a, int b );
int finaltrafo( int );
int divides( int number, int prime ) {
int almostreturnvalue;
if ( ( number % prime ) == 0 ) {
auto k = rarelydone( number, prime );
auto l = rarelydone( number, k );
[[unlikely]] almostreturnvalue = rarelydone( k, l );
} else {
auto a = oftendone( number, prime );
almostreturnvalue = oftendone( a, a );
}
return finaltrafo( almostreturnvalue );
}
godbolt link comparing the presence/absence of the attribute
The correct way to define likely/unlikely macros in C++11 is the following:
#define LIKELY(condition) __builtin_expect(static_cast<bool>(condition), 1)
#define UNLIKELY(condition) __builtin_expect(static_cast<bool>(condition), 0)
This method is compatible with all C++ versions, unlike [[likely]]
, but relies on non-standard extension __builtin_expect
.
When these macros defined this way:
#define LIKELY(condition) __builtin_expect(!!(condition), 1)
That may change the meaning of if
statements and break the code. Consider the following code:
#include <iostream>
struct A
{
explicit operator bool() const { return true; }
operator int() const { return 0; }
};
#define LIKELY(condition) __builtin_expect((condition), 1)
int main() {
A a;
if(a)
std::cout << "if(a) is true\n";
if(LIKELY(a))
std::cout << "if(LIKELY(a)) is true\n";
else
std::cout << "if(LIKELY(a)) is false\n";
}
And its output:
if(a) is true
if(LIKELY(a)) is false
As you can see, the definition of LIKELY using !!
as a cast to bool
breaks the semantics of if
.
The point here is not that operator int()
and operator bool()
should be related. Which is good practice.
Rather that using !!(x)
instead of static_cast<bool>(x)
loses the context for C++11 contextual conversions.
As the other answers have all adequately suggested, you can use __builtin_expect
to give the compiler a hint about how to arrange the assembly code. As the official docs point out, in most cases, the assembler built into your brain will not be as good as the one crafted by the GCC team. It's always best to use actual profile data to optimize your code, rather than guessing.
Along similar lines, but not yet mentioned, is a GCC-specific way to force the compiler to generate code on a "cold" path. This involves the use of the noinline
and cold
attributes, which do exactly what they sound like they do. These attributes can only be applied to functions, but with C++11, you can declare inline lambda functions and these two attributes can also be applied to lambda functions.
Although this still falls into the general category of a micro-optimization, and thus the standard advice applies—test don't guess—I feel like it is more generally useful than __builtin_expect
. Hardly any generations of the x86 processor use branch prediction hints (reference), so the only thing you're going to be able to affect anyway is the order of the assembly code. Since you know what is error-handling or "edge case" code, you can use this annotation to ensure that the compiler won't ever predict a branch to it and will link it away from the "hot" code when optimizing for size.
Sample usage:
void FooTheBar(void* pFoo)
{
if (pFoo == nullptr)
{
// Oh no! A null pointer is an error, but maybe this is a public-facing
// function, so we have to be prepared for anything. Yet, we don't want
// the error-handling code to fill up the instruction cache, so we will
// force it out-of-line and onto a "cold" path.
[&]() __attribute__((noinline,cold)) {
HandleError(...);
}();
}
// Do normal stuff
⋮
}
Even better, GCC will automatically ignore this in favor of profile feedback when it is available (e.g., when compiling with -fprofile-use
).
See the official documentation here: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
GCC supports the function __builtin_expect(long exp, long c)
to provide this kind of feature. You can check the documentation here.
Where exp
is the condition used and c
is the expected value. For example in you case you would want
if (__builtin_expect(normal, 1))
Because of the awkward syntax this is usually used by defining two custom macros like
#define likely(x) __builtin_expect (!!(x), 1)
#define unlikely(x) __builtin_expect (!!(x), 0)
just to ease the task.
Mind that:
gcc has long __builtin_expect (long exp, long c) (emphasis mine):
You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0)) foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
As the documentation notes you should prefer to use actual profile feedback and this article shows a practical example of this and how it in their case at least ends up being an improvement over using __builtin_expect
. Also see How to use profile guided optimizations in g++?.
We can also find a Linux kernel newbies article on the kernal macros likely() and unlikely() which use this feature:
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
Note the !!
used in the macro we can find the explanation for this in Why use !!(condition) instead of (condition)?.
Just because this technique is used in the Linux kernel does not mean it always makes sense to use it. We can see from this question I recently answered difference between the function performance when passing parameter as compile time constant or variable that many hand rolled optimizations techniques don't work in the general case. We need to profile code carefully to understand whether a technique is effective. Many old techniques may not even be relevant with modern compiler optimizations.
Note, although builtins are not portable clang also supports __builtin_expect.
Also on some architectures it may not make a difference.
With regards to the OP, no, there is no way in GCC to tell the processor to always assume the branch is or isn't taken. What you have is __builtin_expect, which does what others say it does. Furthermore, I think you don't want to tell the processor whether the branch is taken or not always. Today's processors, such as the Intel architecture can recognize fairly complex patterns and adapt effectively.
However, there are times you want to assume control of whether by default a branch is predicted taken or not: When you know the code will be called "cold" with respect of branching statistics.
One concrete example: Exception management code. By definition the management code will happen exceptionally, but perhaps when it occurs maximum performance is desired (there may be a critical error to take care off as soon as possible), hence you may want to control the default prediction.
Another example: You may classify your input and jump into the code that handles the result of your classification. If there are many classifications, the processor may collect statistics but lose them because the same classification does not happen soon enough and the prediction resources are devoted to recently called code. I wish there would be a primitive to tell the processor "please do not devote prediction resources to this code" the way you sometimes can say "do not cache this".