How much overhead is there in calling a function in C++?

前端未结

关注

 16  1581

A lot of literature talks about using inline functions to \"avoid the overhead of a function call\". However I haven\'t seen quantifiable data. What is the actual overhead o

相关标签:

16条回答

你的背包

2020-12-02 08:09
Depending on how you structure your code, division into units such as modules and libraries it might matter in some cases profoundly.
1. Using dynamic library function with external linkage will most of the time impose full stack frame processing.
  That is why using qsort from stdc library is one order of magnitude (10 times) slower than using stl code when comparison operation is as simple as integer comparison.
2. Passing function pointers between modules will also be affected.
3. The same penalty will most likely affect usage of C++'s virtual functions as well as other functions, whose code is defined in separate modules.
4. Good news is that whole program optimization might resolve the issue for dependencies between static libraries and modules.
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-02 08:12

There is a great concept called 'register shadowing', which allows to pass ( up to 6 ? ),values thru registers ( on CPU ) instead of stack ( memory ). Also, depending on the function and variables used within, compiler may just decide that frame management code is not required !!

Also, even C++ compiler may do a 'tail recursion optimiztaion', i.e. if A() calls B(), and after calling B(), A just returns, compiler will reuse the stack frame !!

Of course, this all can be done, only if program sticks to the semantics of standard ( see pointer aliasing and it's effect on optimizations )

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-12-02 08:14
Your question is one of the questions, that has no answer one could call the "absolute truth". The overhead of a normal function call depends on three factors:
1. The CPU. The overhead of x86, PPC, and ARM CPUs varies a lot and even if you just stay with one architecture, the overhead also varies quite a bit between an Intel Pentium 4, Intel Core 2 Duo and an Intel Core i7. The overhead might even vary noticeably between an Intel and an AMD CPU, even if both run at the same clock speed, since factors like cache sizes, caching algorithms, memory access patterns and the actual hardware implementation of the call opcode itself can have a huge influence on the overhead.
2. The ABI (Application Binary Interface). Even with the same CPU, there often exist different ABIs that specify how function calls pass parameters (via registers, via stack, or via a combination of both) and where and how stack frame initialization and clean-up takes place. All this has an influence on the overhead. Different operating systems may use different ABIs for the same CPU; e.g. Linux, Windows and Solaris may all three use a different ABI for the same CPU.
3. The Compiler. Strictly following the ABI is only important if functions are called between independent code units, e.g. if an application calls a function of a system library or a user library calls a function of another user library. As long as functions are "private", not visible outside a certain library or binary, the compiler may "cheat". It may not strictly follow the ABI but instead use shortcuts that lead to faster function calls. E.g. it may pass parameters in register instead of using the stack or it may skip stack frame setup and clean-up completely if not really necessary.
If you want to know the overhead for a specific combination of the three factors above, e.g. for Intel Core i5 on Linux using GCC, your only way to get this information is benchmarking the difference between two implementations, one using function calls and one where you copy the code directly into the caller; this way you force inlining for sure, since the inline statement is only a hint and does not always lead to inlining.

However, the real question here is: Does the exact overhead really matter? One thing is for sure: A function call always has an overhead. It may be small, it may be big, but it is for sure existent. And no matter how small it is if a function is called often enough in a performance critical section, the overhead will matter to some degree. Inlining rarely makes your code slower, unless you terribly overdo it; it will make the code bigger though. Today's compilers are pretty good at deciding themselves when to inline and when not, so you hardly ever have to rack your brain about it.

Personally I ignore inlining during development completely, until I have a more or less usable product that I can profile and only if profiling tells me, that a certain function is called really often and also within a performance critical section of the application, then I will consider "force-inlining" of this function.

So far my answer is very generic, it applies to C as much as it applies to C++ and Objective-C. As a closing word let me say something about C++ in particular: Methods that are virtual are double indirect function calls, that means they have a higher function call overhead than normal function calls and also they cannot be inlined. Non-virtual methods might be inlined by the compiler or not but even if they are not inlined, they are still significant faster than virtual ones, so you should not make methods virtual, unless you really plan to override them or have them overridden.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2020-12-02 08:16
I made a simple benchmark against a simple increment function:

inc.c:
```
typedef unsigned long ulong;
ulong inc(ulong x){
    return x+1;
}
```
main.c
```
#include <stdio.h>
#include <stdlib.h>

typedef unsigned long ulong;

#ifdef EXTERN 
ulong inc(ulong);
#else
static inline ulong inc(ulong x){
    return x+1;
}
#endif

int main(int argc, char** argv){
    if (argc < 1+1)
        return 1;
    ulong i, sum = 0, cnt;
    cnt = atoi(argv[1]);
    for(i=0;i<cnt;i++){
        sum+=inc(i);
    }
    printf("%lu\n", sum);
    return 0;
}
```
Running it with a billion iterations on my Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz gave me:
- 1.4 seconds for the inlinining version
- 4.4 seconds for the regularly linked version
(It appears to fluctuate by up to 0.2 but I'm too lazy to calculate proper standard deviations nor do I care for them)

This suggests that the overhead of function calls on this computer is about 3 nanoseconds

The fastest I measured something at it was about 0.3ns so that would suggest a function call costs about 9 primitive ops, to put it very simplistically.

This overhead increases by about another 2ns per call (total time call time about 6ns) for functions called through a PLT (functions in a shared library).
0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2020-12-02 08:16
Modern CPUs are very fast (obviously!). Almost every operation involved with calls and argument passing are full speed instructions (indirect calls might be slightly more expensive, mostly the first time through a loop).

Function call overhead is so small, only loops that call functions can make call overhead relevant.

Therefore, when we talk about (and measure) function call overhead today, we are usually really talking about the overhead of not being able to hoist common subexpressions out of loops. If a function has to do a bunch of (identical) work every time it is called, the compiler would be able to "hoist" it out of the loop and do it once if it was inlined. When not inlined, the code will probably just go ahead and repeat the work, you told it to!

Inlined functions seem impossibly faster not because of call and argument overhead, but because of common subexpressions that can be hoisted out of the function.

Example:
```
Foo::result_type MakeMeFaster()
{
  Foo t = 0;
  for (auto i = 0; i < 1000; ++i)
    t += CheckOverhead(SomethingUnpredictible());
  return t.result();
}

Foo CheckOverhead(int i)
{
  auto n = CalculatePi_1000_digits();
  return i * n;
}
```
An optimizer can see through this foolishness and do:
```
Foo::result_type MakeMeFaster()
{
  Foo t;
  auto _hidden_optimizer_tmp = CalculatePi_1000_digits();
  for (auto i = 0; i < 1000; ++i)
    t += SomethingUnpredictible() * _hidden_optimizer_tmp;
  return t.result();
}
```
It seems like call overhead is impossibly reduced because it really has hoised a big chunk of the function out of the loop (the CalculatePi_1000_digits call). The compiler would need to be able to prove that CalculatePi_1000_digits always returns the same result, but good optimizers can do that.
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-12-02 08:17
As others have said, you really don't have to worry too much about overhead, unless you're going for ultimate performance or something akin. When you make a function the compiler has to write code to:
- Save function parameters to the stack
- Save the return address to the stack
- Jump to the starting address of the function
- Allocate space for the function's local variables (stack)
- Run the body of the function
- Save the return value (stack)
- Free space for the local variables aka garbage collection
- Jump back to the saved return address
- Free up save for the parameters etc...
However, you have to account for lowering the readability of your code, as well as how it will impact your testing strategies, maintenance plans, and overall size impact of your src file.
0 讨论(0)
发布评论:

提交评论
- 加载中...