Cost of a virtual function in a tight loop

前端 未结 4 1455
别跟我提以往
别跟我提以往 2021-01-18 20:09

I am in a situation where I have game objects that have a virtual function Update(). There are a lot of game objects (currently a little over 7000) and the loop calls update

相关标签:
4条回答
  • 2021-01-18 20:27

    A virtual function call is not going to add much more than a single indirection and a hard-to-predict jump. That means that usually you're down one pipeline flush or about 20 cycles per virtual function. 7000 of them is about 140000 cycles, which should be negligible compared to your average update function. If it isn't, say that most of your update functions are just empty, you can consider putting the update-able objects in a separate list for this purpose.

    Removing the virtual function is just going to lead to one of you replacing it with an identical but self-implemented system. This is the exact kind of place where a virtual function makes sense.

    Per reference, 140000 cycles is about 50 microseconds. That's assuming a P4 with a huge pipeline and always a full pipeline flush (which you don't usually get).

    0 讨论(0)
  • 2021-01-18 20:29

    Although it's not the same code and may not be the same compiler as you're using, here's a bit of reference data from a rather old benchmark (bench++ by Joe Orost):

    Test Name:   F000005                         Class Name:  Style
    CPU Time:        7.70  nanoseconds           plus or minus      0.385
    Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
    Test Description:
     Time to test a global using a 10-way if/else if statement
     compare this test with F000006
    
    
    Test Name:   F000006                         Class Name:  Style
    CPU Time:        2.00  nanoseconds           plus or minus     0.0999
    Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
    Test Description:
     Time to test a global using a 10-way switch statement
     compare this test with F000005
    
    
    Test Name:   F000007                         Class Name:  Style
    CPU Time:        3.41  nanoseconds           plus or minus      0.171
    Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
    Test Description:
     Time to test a global using a 10-way sparse switch statement
     compare this test with F000005 and F000006
    
    
    Test Name:   F000008                         Class Name:  Style
    CPU Time:        2.20  nanoseconds           plus or minus      0.110
    Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
    Test Description:
     Time to test a global using a 10-way virtual function class
     compare this test with F000006
    

    This particular result is from compiling with the 64-bit edition of VC++ 9.0 (VS 2008), but it's reasonably similar to what I've seen from other recent compilers. The bottom line is that the virtual function is faster than most of the obvious alternatives, and very close to the same speed as the only one that beats it (in fact, the two being equal is within the measured margin of error). That, however, depends on the values involved being dense -- as you can see in F00007, if the values are sparse, the switch statement produces code that's slower than the virtual function call.

    Bottom line: The virtual function call is probably the wrong place to look. Refactored code might easily work out slower, and even at best it probably won't gain enough to notice or care about.

    0 讨论(0)
  • 2021-01-18 20:41

    If you can't profile, have a look at the assembler code to get an idea how expensive the lookup really is. It might be a simple indirect jump which costs almost nothing.

    If you need to refactor, here is a suggestion: Create lots of "UpdateXxx" classes which know how to call the new non-virtual update() method. Collect those in an array and then call update() on them.

    But my guess is that you won't save much, especially not with only 7K objects.

    Note on profiling: If you can't use a profiler (makes me wonder why not), time the calls to update() and log calls which take longer than, say, 100ms. The timing isn't expensive and it allows you to quickly figure out which calls are most expensive.

    0 讨论(0)
  • 2021-01-18 20:49

    another test with virtual, inline and direct calls you may find here [enter link description here][1] Virtual functions and performance - C++

    0 讨论(0)
提交回复
热议问题