Good morning all,
I\'m searching for a very fast binary serialization technique for c++. I only need to serialize data contained in objects (no pointers etc.). I\'d like
Because I/O is most likely to be the bottleneck a compact format may help. Out of curiosity I tried the following Colfer scheme compiled as colf -s 16 C
.
package data
type item struct {
off uint64
size uint32
}
... with a comparable C test:
clock_t start = clock();
data_item data;
void* buf = malloc(colfer_size_max);
FILE* fd = fopen( "test.colfer.dat", "wb" );
for ( long i = 0; i < tests; i++ )
{
data.off = i;
data.size = i & 0xFFFF;
size_t n = data_item_marshal( &data, buf );
fwrite( buf, n, 1, fd );
}
fclose( fd );
clock_t stop = clock();
The results are quite disappointing on SSD despite the fact that the serial size is 40% smaller in comparison to the raw struct dumps.
colfer took 0.520 seconds
plain took 0.320 seconds
Since the generated code is pretty fast it seems unlikely you'll win anything with serialization libraries.
google flatbuffers, similar to protocol buffer but a way faster
https://google.github.io/flatbuffers/
https://google.github.io/flatbuffers/md__benchmarks.html
To really answer this question, the reason why the C++ version is slow is that it calls the ostream.write
too many times, which induce a huge amount of unnecessary state checks. You can create a simple buffer and use only one write
and you will see the difference.
If your disk/network is really fast enough to not become the bottleneck, flatbuffers capnproto are great options to handle this for you.
Otherwise, protobuf, xxx-compact
... whatever uses varint encoding can probably serialize these data to a quarter of the original size.
HPS from the scientific computing community is also a great option for this kind of highly structured data and probably the fastest in speed and the smallest in message size in this case due to its encoding scheme.
A lot of the performance is going to depend on memory buffers and how you fill up blocks of memory before writing to disk. And there are some tricks to making standard c++ streams a little faster, like std::ios_base::sync_with_stdio (false);
But IMHO, the world doesn't need another implementation of serialization. Here are some that other folks maintain that you might want to look into:
If the task to be performed is really serialization you might check out Google's Protocol Buffers. They provide fast serialization of C++ classes. The site also mentions some alternative libraries e.g. boost.serialization (only to state that protocol buffers outperform them in most cases, of course ;-)