C++ Techniques: Type-Erasure vs. Pure Polymorphism

What are the advantages/disadvantages of the two techniques in comparison ? And more importantly: Why and when should one be used over the other ? Is it just a matter of personal taste/preference ?

To the best of my abilities, I haven't found another post that explicitly addresses my question. Among many questions regarding the actual use of polymorphism and/or type-erasure, the following seems to be closest, or so it seemed, but it doesn't really address my question either:

C++ -& CRTP . Type erasure vs polymorphism

Please, note that I very well understand both techniques. To this end, I provide a simple, self-contained, working example below, which I'm happy to remove, if it is felt unnecessary. However, the example should clarify what the two techniques mean with respect to my question. I'm not interested in discussing nomenclatures. Also, I know the difference between compile- and run-time polymorphism, though I wouldn't consider this to be relevant to the question. Note that my interest is less in performance-differences, if there are any. However, if there was a striking argument for one or the other based on performance, I'd be curious to read it. In particular, I would like to hear about concrete examples (no code) that would really only work with one of the two approaches.

Looking at the example below, one primary difference is the memory-management, which for polymorphism remains on the user-side, and for type-erasure is neatly tucked away requiring some reference-counting (or boost). Having said that, depending on the usage scenarios, the situation might be improved for the polymorphism-example by using smart-pointers with the vector (?), though for arbitrary cases this may very well turn out to be impractical (?). Another aspect, potentially in favor of type-erasure, may be the independence of a common interface, but why exactly would that be an advantage (?).

The code as given below was tested (compiled & run) with MS VisualStudio 2008 by simply putting all of the following code-blocks into a single source-file. It should also compile with gcc on Linux, or so I hope/assume, because I see no reason why not (?) :-) I have split/divided the code here for clarity.

These header-files should be sufficient, right (?).

#include <iostream>
#include <vector>
#include <string>

Simple reference-counting to avoid boost (or other) dependencies. This class is only used in the type-erasure-example below.

class RefCount
{
  RefCount( const RefCount& );
  RefCount& operator= ( const RefCount& );
  int m_refCount;

  public:
    RefCount() : m_refCount(1) {}
    void Increment() { ++m_refCount; }
    int Decrement() { return --m_refCount; }
};

This is the simple type-erasure example/illustration. It was copied and modified in part from the following article. Mainly I have tried to make it as clear and straightforward as possible. http://www.cplusplus.com/articles/oz18T05o/

class Object {
  struct ObjectInterface {
    virtual ~ObjectInterface() {}
    virtual std::string GetSomeText() const = 0;
  };

  template< typename T > struct ObjectModel : ObjectInterface {
    ObjectModel( const T& t ) : m_object( t ) {}
    virtual ~ObjectModel() {}
    virtual std::string GetSomeText() const { return m_object.GetSomeText(); }
    T m_object;
 };

  void DecrementRefCount() {
    if( mp_refCount->Decrement()==0 ) {
      delete mp_refCount; delete mp_objectInterface;
      mp_refCount = NULL; mp_objectInterface = NULL;
    }
  }

  Object& operator= ( const Object& );
  ObjectInterface *mp_objectInterface;
  RefCount *mp_refCount;

  public:
    template< typename T > Object( const T& obj )
      : mp_objectInterface( new ObjectModel<T>( obj ) ), mp_refCount( new RefCount ) {}
    ~Object() { DecrementRefCount(); }

    std::string GetSomeText() const { return mp_objectInterface->GetSomeText(); }

    Object( const Object &obj ) {
      obj.mp_refCount->Increment(); mp_refCount = obj.mp_refCount;
      mp_objectInterface = obj.mp_objectInterface;
    }
};

struct MyObject1 { std::string GetSomeText() const { return "MyObject1"; } };
struct MyObject2 { std::string GetSomeText() const { return "MyObject2"; } };

void UseTypeErasure() {
  typedef std::vector<Object> ObjVect;
  typedef ObjVect::const_iterator ObjVectIter;

  ObjVect objVect;
  objVect.push_back( Object( MyObject1() ) );
  objVect.push_back( Object( MyObject2() ) );

  for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
    std::cout << iter->GetSomeText();
}

As far as I'm concerned, this seems to achieve pretty much the same using polymorphism, or maybe not (?).

struct ObjectInterface {
  virtual ~ObjectInterface() {}
  virtual std::string GetSomeText() const = 0;
};

struct MyObject3 : public ObjectInterface {
  std::string GetSomeText() const { return "MyObject3"; } };

struct MyObject4 : public ObjectInterface {
  std::string GetSomeText() const { return "MyObject4"; } };

void UsePolymorphism() {
  typedef std::vector<ObjectInterface*> ObjVect;
  typedef ObjVect::const_iterator ObjVectIter;

  ObjVect objVect;
  objVect.push_back( new MyObject3 );
  objVect.push_back( new MyObject4 );

  for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
    std::cout << (*iter)->GetSomeText();

  for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
    delete *iter;
}

And finally for testing all of the above together.

int main() {
  UseTypeErasure();
  UsePolymorphism();
  return(0);
}

C++ style virtual method based polymorphism:

You have to use classes to hold your data.
Every class has to be built with your particular kind of polymorphism in mind.
Every class has a common binary-level dependency, which restricts how the compiler creates the instance of each class.
The data you are abstracting must explicitly describe an interface that describes your needs.

C++ style template based type erasure (with virtual method based polymorphism doing the erasure):

You have to use template to talk about your data.
Each chunk of data you are working on may be completely unrelated to other options.
The type erasure work is done within public header files, which bloats compile time.
Each type erased has its own template instantiated, which can bloat binary size.
The data you are abstracting need not be written as being directly dependent on your needs.

Now, which is better? Well, that depends if the above things are good or bad in your particular situation.

As an explicit example, std::function<...> uses type erasure which allows it to take function pointers, function references, output of a whole pile of template-based functions that generate types at compile time, myraids of functors which have an operator(), and lambdas. All of these types are unrelated to one another. And because they aren't tied to having a virtual operator(), when they are used outside of the std::function context the abstraction they represent can be compiled away. You couldn't do this without type erasure, and you probably wouldn't want to.

On the other hand, just because a class has a method called DoFoo, doesn't mean that they all do the same thing. With polymorphism, it isn't just any DoFoo you are calling, but the DoFoo from a particular interface.

As for your sample code... your GetSomeText should be virtual ... override in the polymorphism case.

There is no need to reference count just because you are using type erasure. There is no need not to use reference counting just because you are using polymorphsm.

Your Object could wrap T*s like how you stored vectors of raw pointers in the other case, with manual destruction of their contents (equivalent to having to call delete). Your Object could wrap a std::shared_ptr<T>, and in the other case you could have vector of std::shared_ptr<T>. Your Object could contain a std::unique_ptr<T>, equivalent to having a vector of std::unique_ptr<T> in the other case. Your Object's ObjectModel could extract copy constructors and assignment operators from the T and expose them to Object, allowing full-on value semantics for your Object, which corresponds to the a vector of T in your polymorphism case.

Here's one view: The question seems to ask how one should choose between late binding ("runtime polymorphism") and early binding ("compile-time polymorphism").

As KerrekSB points out in his comments, there are some things you can do with late binding that it just isn't realistic to do with early binding. Many uses of the Strategy pattern (decoding network I/O) or the Abstract Factory pattern (runtime-selected class factories) fall into this category.

If both approaches are viable, then choosing is a matter of the trade offs involved. In C++ applications, the main tradeoffs I see between early and late binding are implementation maintainability, binary size, and performance.

There are at least some people who feel that C++ templates in any shape or form are impossible to comprehend. Or possibly have some other, less dramatic reservation with templates. C++ templates have many little gotchas ("when do I need to use the 'typename' and 'template' keywords?"), and non-obvious tricks (SFINAE comes to mind).

Another tradeoff is optimization. When you bind early, you give the compiler more information about your program, and so it can (potentially) do a better job optimizing. When you bind late, the compiler (probably) doesn't know ahead of time as much information -- some of that information may be in other compilation units, and so the optimizer can't do as much.

Another tradeoff is program size. In C++ at least, using "compile-time polymorphism" sometimes balloons binary size, as the compiler creates, optimizes, and emits different code for each used specialization. In contrast, when binding late, there's only one code path.

It's interesting to compare the same tradeoff being made in a different context. Take web applications, where one uses (some type of) polymorphism to deal with differences between browsers, and possibly for internationalization (i18n)/localization. Now, a hand-written JavaScript web application would likely use what amounts to late binding here, by having methods which detect capabilities at runtime to figure out what to do. Libraries like jQuery take this tack.

Another approach is to write different code for each possible browser/i18n possibility. While this sounds absurd, it is far from unheard of. The Google Web Toolkit uses this approach. GWT has its "deferred binding" mechanism, used to specialize the compiler's output to different browsers and different localizations. GWT's "deferred binding" mechanism uses early binding: The GWT Java-to-JavaScript compiler figures out all possible ways the polymorphism might be needed, and spits out an entirely different "binary" for each.

The tradeoffs are similar. Wrapping your head around how you extend GWT using deferred binding can be a headache and a half; Having knowledge at compile time allows GWT's compiler to optimize each specialization separately, possibly yielding better performance, and smaller size for each specialization; The whole of a GWT application can end up being many times the size of a comparable jQuery application, due to all of the precompiled specializations.

One benefit to runtime generics that no-one here has mentioned (?) is the possibility for code that is generated and injected into a running application, to use the same List, Hashmap / Dictionary etc. that everything else in that application is already using. Why you'd want to do that, is another question.

来源：https://stackoverflow.com/questions/13309927/c-techniques-type-erasure-vs-pure-polymorphism

标签

c++

paradigms