I read about function pointers in C. And everyone said that will make my program run slow. Is it true?
I made a program to check it. And I got the same results on bo
Possibly.
The answer depends on what the function pointer is being used for and hence what the alternatives are. Comparing function pointer calls to direct function calls is misleading if a function pointer is being used to implement a choice that's part of our program logic and which can't simply be removed. I'll go ahead and nonetheless show that comparison and come back to this thought afterwards.
Function pointer calls have the most opportunity to degrade performance compared to direct function calls when they inhibit inlining. Because inlining is a gateway optimization, we can craft wildly pathological cases where function pointers are made arbitrarily slower than the equivalent direct function call:
void foo(int* x) {
*x = 0;
}
void (*foo_ptr)(int*) = foo;
int call_foo(int *p, int size) {
int r = 0;
for (int i = 0; i != size; ++i)
r += p[i];
foo(&r);
return r;
}
int call_foo_ptr(int *p, int size) {
int r = 0;
for (int i = 0; i != size; ++i)
r += p[i];
foo_ptr(&r);
return r;
}
Code generated for call_foo()
:
call_foo(int*, int):
xor eax, eax
ret
Nice. foo()
has not only been inlined, but doing so has allowed the compiler to eliminate the entire preceding loop! The generated code simply zeroes out the return register by XORing the register with itself and then returns. On the other hand, compilers will have to generate code for the loop in call_foo_ptr()
(100+ lines with gcc 7.3) and most of that code effectively does nothing (so long as foo_ptr
still points to foo()
). (In more typical scenarios, you can expect that inlining a small function into a hot inner loop might reduce execution time by up to about an order of magnitude.)
So in a worst case scenario, a function pointer call is arbitrarily slower than a direct function call, but this is misleading. It turns out that if foo_ptr
had been const
, then call_foo()
and call_foo_ptr()
would have generated the same code. However, this would require us to give up the opportunity for indirection provided by foo_ptr
. Is it "fair" for foo_ptr
to be const
? If we're interested in the indirection provided by foo_ptr
, then no, but if that's the case, then a direct function call is not a valid option either.
If a function pointer is being used to provide useful indirection, then we can move the indirection around or in some cases swap out function pointers for conditionals or even macros, but we can't simply remove it. If we've decided that function pointers are a good approach but performance is a concern, then we typically want to pull indirection up the call stack so that we pay the cost of indirection in an outer loop. For example, in the common case where a function takes a callback and calls it in a loop, we might try moving the innermost loop into the callback (and changing the responsibility of each callback invocation accordingly).
You see, in situations that actually matter from the performance point of view, like calling the function repeatedly many times in a cycle, the performance might not be different at all.
This might sound strange to people, who are used to thinking about C code as something executed by an abstract C machine whose "machine language" closely mirrors the C language itself. In such context, "by default" an indirect call to a function is indeed slower than a direct one, because it formally involves an extra memory access in order to determine the target of the call.
However, in real life the code is executed by a real machine and compiled by an optimizing compiler that has a pretty good knowledge of the underlying machine architecture, which helps it to generate the most optimal code for that specific machine. And on many platforms it might turn out that the most efficient way to perform a function call from a cycle actually results in identical code for both direct and indirect call, leading to the identical performance of the two.
Consider, for example, the x86 platform. If we "literally" translate a direct and indirect call into machine code, we might end up with something like this
// Direct call
do-it-many-times
call 0x12345678
// Indirect call
do-it-many-times
call dword ptr [0x67890ABC]
The former uses an immediate operand in the machine instruction and is indeed normally faster than the latter, which has to read the data from some independent memory location.
At this point let's remember that x86 architecture actually has one more way to supply an operand to the call
instruction. It is supplying the target address in a register. And a very important thing about this format is that it is normally faster than both of the above. What does this mean for us? This means that a good optimizing compiler must and will take advantage of that fact. In order to implement the above cycle, the compiler will try to use a call through a register in both cases. If it succeeds, the final code might look as follows
// Direct call
mov eax, 0x12345678
do-it-many-times
call eax
// Indirect call
mov eax, dword ptr [0x67890ABC]
do-it-many-times
call eax
Note, that now the part that matters - the actual call in the cycle body - is exactly and precisely the same in both cases. Needless to say, the performance is going to be virtually identical.
One might even say, however strange it might sound, that on this platform a direct call (a call with an immediate operand in call
) is slower than an indirect call as long as the operand of the indirect call is supplied in a register (as opposed to being stored in memory).
Of course, the whole thing is not as easy in general case. The compiler has to deal with limited availability of registers, aliasing issues etc. But is such simplistic cases as the one in your example (and even in much more complicated ones) the above optimization will be carried out by a good compiler and will completely eliminate any difference in performance between a cyclic direct call and a cyclic indirect call. This optimization works especially well in C++, when calling a virtual function, since in a typical implementation the pointers involved are fully controlled by the compiler, giving it full knowledge of the aliasing picture and other relevant stuff.
Of course, there's always a question of whether your compiler is smart enough to optimize things like that...