Good morning all,
I\'m searching for a very fast binary serialization technique for c++. I only need to serialize data contained in objects (no pointers etc.). I\'d like
If the task to be performed is really serialization you might check out Google's Protocol Buffers. They provide fast serialization of C++ classes. The site also mentions some alternative libraries e.g. boost.serialization (only to state that protocol buffers outperform them in most cases, of course ;-)
To really answer this question, the reason why the C++ version is slow is that it calls the ostream.write
too many times, which induce a huge amount of unnecessary state checks. You can create a simple buffer and use only one write
and you will see the difference.
If your disk/network is really fast enough to not become the bottleneck, flatbuffers capnproto are great options to handle this for you.
Otherwise, protobuf, xxx-compact
... whatever uses varint encoding can probably serialize these data to a quarter of the original size.
HPS from the scientific computing community is also a great option for this kind of highly structured data and probably the fastest in speed and the smallest in message size in this case due to its encoding scheme.
A lot of the performance is going to depend on memory buffers and how you fill up blocks of memory before writing to disk. And there are some tricks to making standard c++ streams a little faster, like std::ios_base::sync_with_stdio (false);
But IMHO, the world doesn't need another implementation of serialization. Here are some that other folks maintain that you might want to look into:
Well, if you want the fastest serialization possible, then you can just write your own serialization class and give it methods to serialize each of the POD types.
The less safety you bring in, the faster it'll run and the harder it'll be to debug, however there is only a fixed number of built-in, so you could enumerate them.
class Buffer
{
public:
inline Buffer& operator<<(int i); // etc...
private:
std::deque<unsigned char> mData;
};
I must admit I don't understand your problem:
There might be better approaches that serialization.
If you're on a Unix system, mmap on the file is the way to do what you want to do.
See http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx for an equivalent on windows.
There are just very few real-life cases where that matters at all. You only ever serialize to make your objects compatible with some kind of external resource. Disk, network, etcetera. The code that transmits the serialized data on the resource is always orders of magnitude slower then the code needed to serialize the object. If you make the serialization code twice as fast, you've made the overall operation no more than 0.5% faster, give or take. That is worth neither the risk nor the effort.
Measure three times, cut once.