Good morning all,
I\'m searching for a very fast binary serialization technique for c++. I only need to serialize data contained in objects (no pointers etc.). I\'d like
The C++ Middleware Writer is an online alternative to serialization libraries. In some cases it is faster than the serialization library in Boost.
Both your C and your C++ code will probably be dominated (in time) by file I/O. I would recommend using memory mapped files when writing your data and leave the I/O buffering to the operating system. Boost.Interprocess could be an alternative.
Because I/O is most likely to be the bottleneck a compact format may help. Out of curiosity I tried the following Colfer scheme compiled as colf -s 16 C
.
package data
type item struct {
off uint64
size uint32
}
... with a comparable C test:
clock_t start = clock();
data_item data;
void* buf = malloc(colfer_size_max);
FILE* fd = fopen( "test.colfer.dat", "wb" );
for ( long i = 0; i < tests; i++ )
{
data.off = i;
data.size = i & 0xFFFF;
size_t n = data_item_marshal( &data, buf );
fwrite( buf, n, 1, fd );
}
fclose( fd );
clock_t stop = clock();
The results are quite disappointing on SSD despite the fact that the serial size is 40% smaller in comparison to the raw struct dumps.
colfer took 0.520 seconds
plain took 0.320 seconds
Since the generated code is pretty fast it seems unlikely you'll win anything with serialization libraries.
google flatbuffers, similar to protocol buffer but a way faster
https://google.github.io/flatbuffers/
https://google.github.io/flatbuffers/md__benchmarks.html
Is there any way you can take advantage of things that stay the same?
I mean, you are just trying to run through "test.c.dat" as fast as you possibly can, right? Can you take advantage of the fact that the file does not change between your serialization attempts? If you are trying to serialize the same file, over and over again, you can optimize based on this. I can make the first serialization attempt take the same amount of time as yours, plus a tiny bit extra for another check, and then if you try and run the serialization again on the same input, I can make my second run go much faster than the first time.
I understand that this may just be a carefully crafted example, but you seem to be focused on making the language accomplish your task as quickly as possible, instead of asking the question of "do I need to accomplish this again?" What is the context of this approach?
I hope this is helpful.
-Brian J. Stinar-