Understanding serialization of polymorphic objects in C++

问题

EDIT: I realised that the code below is a good example of what you cannot do in C++ with anything that is not a POD.

There doesn't seem to exist a way to escape from having a typeid into the classes and do some sort of switch or table lookup (both of which must be carefully maintained) on the receiver side to rebuild the objects.

I have wrote some toy code to serialise objects and two separate mains to write/read them to/from a file.

common.h:

#include <iostream>
using namespace std;

template <typename T>
size_t serialize(std::ostream & o, const T & t) {
  const char * bytes = reinterpret_cast<const char*>(&t);
  for (size_t i = 0; i < t.size(); ++i) {
    o << bytes[i];
  }
  return t.size();
}

size_t deserialize(std::istream & i, char * buffer) {
  size_t len = 0;
  char c;
  while (i.get(c)) {
    buffer[len] = c;
    ++len;
  }
  return len;
}

// toy classes
struct A {
  int a[4];
  virtual ~A() {}
  virtual void print(){cout << "A\n";}
  virtual size_t size() const {return sizeof(*this);}
};
struct B: A {
  int b[16];
  virtual ~B() {}
  virtual void print(){cout << "B\n";}
  virtual size_t size() const {return sizeof(*this);}
};

out.cpp:

#include <fstream>
#include "common.h"

int main() {
  B b;
  A& a = *static_cast<A*>(&b);
  ofstream ofile("serial.bin");
  cout << "size = " << serialize(ofile, a) << endl;
  ofile.close();
  return 0;
}

in.cpp:

#include <fstream>
#include "common.h"

int main() {
  char buffer[1024];
  ifstream ifile("serial.bin");
  cout << "size = " << deserialize(ifile, buffer) << endl;
  ifile.close();
  A& a = *reinterpret_cast<A*>(buffer);
  a.print();
  return 0;
}

If my classes have no virtual functions, this appears to work fine, but in.cpp crashes when they do.

My understanding is that the vptr created by out.cpp is not fine to be used by in.cpp.

Is there something that could be done, possibly avoiding to manually create and maintain a vtable?

回答1:

If you absolutely cannot use any library (as there still might be some options, even for embedded platforms), one option of serializing polymorphic classes might be to provide virtual serialize/deserialize methods.

In this case for example:

struct A {
  int a[4];
  virtual ~A() {}
  virtual void print(){cout << "A\n";}
  virtual size_t size() const {return sizeof(*this);}
  virtual void serialize(std::ostream & o) const
  {
      for (int i = 0; i < 4; ++i) o << a[i];
  }
  virtual void deserialize(std::istream & i)
  {
      for (int i = 0; i < 4; ++i) i >> a[i];
  }
};
struct B: A {
  int b[16];
  virtual ~B() {}
  virtual void print(){cout << "B\n";}
  virtual size_t size() const {return sizeof(*this);}
  virtual void serialize(std::ostream & o) const
  {
      A::serialize(o);
      for (int i = 0; i < 16; ++i) o << b[i];
  }
  virtual void deserialize(std::istream & i)
  {
      A::deserialize(i);
      for (int i = 0; i < 16; ++i) i >> b[i];
  }
};

// prg 1
B b;
b.serialize(ofile);

// prg 2
B.b;
b.deserialize(ifile);

Basically, you'll write the particular members to the file one by one.

However, this is just for simple case you actually know what class do you expect in the file. If there can be multiple classes, you'd need to also write some class identification (e.g. some struct serialization id) to know which class to read. Also, if the classes might change, you might need some kind of versioning the classes.

Pointers are also tricky as mentioned, especially because they can be NULL - you could first write a bool (byte) to determine if the pointer is NULL, then the contents, if any. Similar way you can serialize/deserialize e.g. std::string or std::vector: First write the length, then the items. When reading, you'd read the length, reserve or resize the string/vector, and then read the items.

Another issue might be if the file is transferred to different machine, which might have different byte order (endian). So as you can see, if there is still some library availabe, it is better to use it instead of writing everything from scratch.

To add for the polymorphic deserialization (as I can see you are using just the A on the reader side), you can have for example:

struct A {
  ...
  virtual int get_serialization_id() const = 0;
};
struct B: A {
  ...
  static const int SERIALIZATION_ID = 1; // needs to be different in every polymorphic class
  virtual int get_serialization_id() const
  { return SERIALIZATION_ID; }
};

void serialize(std::ostream & o, const A & a)
{
  o << a.get_serialization_id();
  o << a.serialize();
}

std::unique_ptr<A> deserialize(std::istream & i)
{
  std::unique_ptr<A> result;
  int id;
  i >> id;
  switch (id)
  {
  case B::SERIALIZATION_ID:
    result = std::make_unique<B>();
    break:
  case C::SERIALIZATION_ID:
    result = std::make_unique<C>();
    break:
  ...
  default:
    // leave NULL or throw exception
    return result;
  }
  result->deserialize(i);
  return result;
}

To avoid the switch, you could go more fancy and provide some kind of factory registration (registering serialization IDs along with the class factories in a map, then use the registry to find the factory and create the class). You can go pretty fancy with deserialization :).

And note that there are cases which are really difficult to solve (e.g. recreating instance structures with shared pointers pointing to the same instance from multiple other instances, etc.).

来源：https://stackoverflow.com/questions/45530415/understanding-serialization-of-polymorphic-objects-in-c

标签

c++

serialization

polymorphism