AI Applications in C++: How costly are virtual functions? What are the possible optimizations?

前端 未结 15 1216
慢半拍i
慢半拍i 2020-12-23 12:43

In an AI application I am writing in C++,

  1. there is not much numerical computation
  2. there are lot of structures for which run-time polymorphism is ne
相关标签:
15条回答
  • 2020-12-23 13:05

    Have you actually profiled and found where, and what needs optimization?

    Work on actually optimizing virtual function calls when you have found they actually are the bottleneck.

    0 讨论(0)
  • 2020-12-23 13:09

    I'm reinforcing all answers that say in effect:

    • If you don't actually know it's a problem, any concern about fixing it is probably misplaced.

    What you want to know is:

    • What fraction of execution time (when it's actually running) is spent in the process of invoking methods, and in particular, which methods are the most costly (by this measure).

    Some profilers can give you this information indirectly. They need to summarize at the statement level, but exclusive of the time spent in the method itself.

    My favorite technique is to just pause it a number of times under a debugger.

    If the time spent in the process of virtual function invocations is significant, like say 20%, then on the average 1 out of 5 samples will show, at the bottom of the call stack, in the disassembly window, the instructions for following the virtual function pointer.

    If you don't actually see that, it is not a problem.

    In the process, you will probably see other things higher up the call stack, that actually are not needed and could save you a lot of time.

    0 讨论(0)
  • 2020-12-23 13:10

    The cost is more or less the same than normal functions nowadays for recent CPUS, but they can't be inlined. If you call the function millions times, the impact can be significant (try calling millions of times the same function, for example, once with inline once without, and you will see it can be twice slower if the function itself does something simple; this is not a theoritical case: it is quite common for a lot of numerical computation).

    0 讨论(0)
  • 2020-12-23 13:11

    Section 5.3.3 of the draft Technical Report on C++ Performance is entirely devoted to the overhead of virtual functions.

    0 讨论(0)
  • 2020-12-23 13:11

    If an AI application does not require great deal of number crunching, I wouldn't worry about performance disadvantage of virtual functions. There will be a marginal performance hit, only if they appear in the complex computations which are evaluated repeatedly. I don't think you can force virtual table to stay in L2 cache either.

    There are a couple of optimizations available for virtual functions,

    1. People have written compilers that resort to code analysis and transformation of the program. But, these aren't a production grade compilers.
    2. You could replace all virtual functions with equivalent "switch...case" blocks to call appropriate functions based on the type in the hierarchy. This way you'll get rid of compiler managed virtual table and you'll have your own virtual table in the form of switch...case block. Now, chances of your own virtual table being in the L2 cache are high as it in the code path. Remember, you'll need RTTI or your own "typeof" function to achieve this.
    0 讨论(0)
  • 2020-12-23 13:13

    Virtual functions are very efficient. Assuming 32 bit pointers the memory layout is approximately:

    classptr -> [vtable:4][classdata:x]
    vtable -> [first:4][second:4][third:4][fourth:4][...]
    first -> [code:x]
    second -> [code:x]
    ...
    

    The classptr points to memory that is typically on the heap, occasionally on the stack, and starts with a four byte pointer to the vtable for that class. But the important thing to remember is the vtable itself is not allocated memory. It's a static resource and all objects of the same class type will point to the exactly the same memory location for their vtable array. Calling on different instances won't pull different memory locations into L2 cache.

    This example from msdn shows the vtable for class A with virtual func1, func2, and func3. Nothing more than 12 bytes. There is a good chance the vtables of different classes will also be physically adjacent in the compiled library (you'll want to verify this is you're especially concerned) which could increase cache efficiency microscopically.

    CONST SEGMENT
    ??_7A@@6B@
       DD  FLAT:?func1@A@@UAEXXZ
       DD  FLAT:?func2@A@@UAEXXZ
       DD  FLAT:?func3@A@@UAEXXZ
    CONST ENDS
    

    The other performance concern would be instruction overhead of calling through a vtable function. This is also very efficient. Nearly identical to calling a non-virtual function. Again from the example from msdn:

    ; A* pa;
    ; pa->func3();
    mov eax, DWORD PTR _pa$[ebp]
    mov edx, DWORD PTR [eax]
    mov ecx, DWORD PTR _pa$[ebp]
    call  DWORD PTR [edx+8]
    

    In this example ebp, the stack frame base pointer, has the variable A* pa at zero offset. The register eax is loaded with the value at location [ebp], so it has the A*, and edx is loaded with the value at location [eax], so it has class A vtable. Then ecx is loaded with [ebp], because ecx represents "this" it now holds the A*, and finally the call is made to the value at location [edx+8] which is the third function address in the vtable.

    If this function call was not virtual the mov eax and mov edx would not be needed, but the difference in performance would be immeasurably small.

    0 讨论(0)
提交回复
热议问题