Avoiding virtual functions

丶灬走出姿态 提交于 2019-12-01 15:51:52
Guy Sirton

Virtual functions don't cost much. They are an indirect call, basically like a function pointer. What is the performance cost of having a virtual method in a C++ class?

If you're in a situation where every cycle per call counts, that is you're doing very little work in the function call and you're calling it from your inner loop in a performance critical application you probably need a different approach altogether.

If you need the speed, consider embedding a "type(-identifying) number" in the objects, and using a switch statement to select the type-specific code. This can avoid function call overhead completely - just doing a local jump. You won't get faster than that. A cost (in terms of maintainability, recompilation dependencies etc) is in forcing localisation (in the switch) of the type-specific functionality.


IMPLEMENTATION

#include <iostream>
#include <vector>

// virtual dispatch model...

struct Base
{
    virtual int f() const { return 1; }
};

struct Derived : Base
{
    virtual int f() const { return 2; }
};

// alternative: member variable encodes runtime type...

struct Type
{
    Type(int type) : type_(type) { }
    int type_;
};

struct A : Type
{
    A() : Type(1) { }
    int f() const { return 1; }
};

struct B : Type
{
    B() : Type(2) { }
    int f() const { return 2; }
};

struct Timer
{
    Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
    struct timespec from;
    double elapsed() const
    {
        struct timespec to;
        clock_gettime(CLOCK_MONOTONIC, &to);
        return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
    }
};

int main(int argc)
{
  for (int j = 0; j < 3; ++j)
  {
    typedef std::vector<Base*> V;
    V v;

    for (int i = 0; i < 1000; ++i)
        v.push_back(i % 2 ? new Base : (Base*)new Derived);

    int total = 0;

    Timer tv;

    for (int i = 0; i < 100000; ++i)
        for (V::const_iterator i = v.begin(); i != v.end(); ++i)
            total += (*i)->f();

    double tve = tv.elapsed();

    std::cout << "virtual dispatch: " << total << ' ' << tve << '\n';

    // ----------------------------

    typedef std::vector<Type*> W;
    W w;

    for (int i = 0; i < 1000; ++i)
        w.push_back(i % 2 ? (Type*)new A : (Type*)new B);

    total = 0;

    Timer tw;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
        {
            if ((*i)->type_ == 1)
                total += ((A*)(*i))->f();
            else
                total += ((B*)(*i))->f();
        }

    double twe = tw.elapsed();

    std::cout << "switched: " << total << ' ' << twe << '\n';

    // ----------------------------

    total = 0;

    Timer tw2;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
            total += (*i)->type_;

    double tw2e = tw2.elapsed();

    std::cout << "overheads: " << total << ' ' << tw2e << '\n';
  }
}

PERFORMANCE RESULTS

On my Linux system:

~/dev  g++ -O2 -o vdt vdt.cc -lrt
~/dev  ./vdt                     
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726

This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2 times as fast. Of course, that's specific to the exact system tested / compiler flags & version etc., but generally indicative.


COMMENTS RE VIRTUAL DISPATCH

It must be said though that virtual function call overheads are something that's rarely significant, and then only for oft-called trivial functions (like getters and setters). Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost. People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives. The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).

I'm afraid that a series of dynamic_cast checks in a loop would slime up performance worse than a virtual function. If you're going to throw them all in one container, they need to have some type in common, so you may as well make it a pure-virtual base class with that method in it.

There's not all that much to the virtual function dispatch in that context: a vtable lookup, an adjustment of the supplied this pointer, and an indirect call.

If performance is that critical, you might be able to use a separate container for each subtype and process each container independently. If order matters, you'd be doing so many backflips that the virtual dispatch is probably faster.

If you're going to store all of these objects in the same container, then either you're going to have to write a heterogeneous container type (slow and expensive), you're going to have to store a container of void *s (yuck!), or the classes are going to have to be related to each other via inheritance. If you opt to go with either of the first two options, you'll have to have some logic in place to look at each element in the container, figure out what type it is, and then call the appropriate doYourJob() implementation, which essentially boils down to inheritance.

I strongly suggest trying out the simple, straightforward approach of using inheritance first. If this is fast enough, that's great! You're done. If it isn't, then try using some other scheme. Never avoid a useful language feature because of the cost unless you have some good hard proof to suggest that the cost is too great.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!