Why is ProtoBuf so slow on the 1st call but very fast inside loops?

问题

Inspired from this question. I created a small benchmark program to compare ProtoBuf, BinaryFormatter and Json.NET. The benchmark itself is a small console based one at https://github.com/sidshetye/SerializersCompare .Feel free to add/improve, it's quite simple to add a new serializer to the mix. Anyway, my results are:

        Binary Formatter         ProtoBuf          Json.NET     ServiceStackJson   ServiceStackJSV
 Loop     Size:512 bytes    Size:99 bytes    Size:205 bytes      Size:205 bytes     Size:181 bytes
    1         16.1242 ms      151.6354 ms       277.2085 ms         129.8321 ms        146.3547 ms
    2          0.0673 ms        0.0349 ms         0.0727 ms           0.0343 ms          0.0370 ms
    4          0.0292 ms        0.0085 ms         0.0303 ms           0.0145 ms          0.0148 ms
    8          0.0255 ms        0.0069 ms         0.0017 ms           0.0216 ms          0.0129 ms
   16          0.0011 ms        0.0064 ms         0.0282 ms           0.0114 ms          0.0120 ms
   32          0.0164 ms        0.0061 ms         0.0334 ms           0.0112 ms          0.0120 ms
   64          0.0347 ms        0.0073 ms         0.0296 ms           0.0121 ms          0.0013 ms
  128          0.0312 ms        0.0058 ms         0.0266 ms           0.0062 ms          0.0117 ms
  256          0.0256 ms        0.0097 ms         0.0448 ms           0.0087 ms          0.0116 ms
  512          0.0261 ms        0.0058 ms         0.0307 ms           0.0127 ms          0.0116 ms
 1024          0.0258 ms        0.0057 ms         0.0309 ms           0.0113 ms          0.0122 ms
 2048          0.0257 ms        0.0059 ms         0.0297 ms           0.0125 ms          0.0121 ms
 4096          0.0247 ms        0.0060 ms         0.0290 ms           0.0119 ms          0.0120 ms
 8192          0.0247 ms        0.0060 ms         0.0286 ms           0.0115 ms          0.0121 ms

Disclaimer:

The results above are from within a Windows VM - the Stopwatch/timer values for very small intervals may not be 100% accurate compared to bare-metal OSes. So ignore ultra low values in the above table.
For ServiceStack, the Json and JSV scored were taken from two separate runs. Since they share the same underlying ServiceStack library, running one right after the other affects the "cold start" 1 loop scores for the next run (it's 'warm start' fast)

BinaryFormatter is the largest in size but also the fastest for a single serialization => deserialization loop. However, once we tight loop around the serialization => deserialization code, ProtoBuf is super fast.

Question#1: Why is ProtoBuf that much slower for a single serialization => deserialization loop?

Question#2: From a practical perspective, what can we do to get past that "cold start"? Run at least one object (of any type) through it? Run every (critical) object type through it?

回答1:

Question#1: Why is ProtoBuf that much slower for a single serialization => deserialization loop?

Because it does a metric ton of work to analyse the model and prepare the strategy; I've spent a lot of time making the generated strategy be as insanely fast as possible, but it could be that I've skimped on optimizations in the meta-programming layer. I'm happy to add that as an item to look at, to reduce the time on a first pass. Of course, on the other hand the meta-programming layer is still twice as fast as Json.NET's equivalent pre-processing ;p

Question#2: From a practical perspective, what can we do to get past that "cold start"? Run at least one object (of any time) through it? Run every (critical) object type through it?

Various options:

use the "precompile" tool as part of your build process, to generate the compiled serializer as a separate fully-static compiled dll that you can reference and use like normal: exactly zero meta-programming then happens
explicitly tell the model about the "root" types at startup, and store the output of Compile()
```
static TypeModel serializer;
...
RuntimeTypeModel.Default.Add(typeof(Foo), true);
RuntimeTypeModel.Default.Add(typeof(Bar), true);
serializer = RuntimeTypeModel.Default.Compile();
```
(the Compile() method will analyse from the root-types, adding in any additional types needed as it goes, returning a compiled generated instance)
explicitly tell the model about the "root" types at startup, and call CompileInPlace() "a few times"; CompileInPlace() will not full expand the model - but calling it a few times should cover most bases, since compiling one layer will bring other types into the model
```
RuntimeTypeModel.Default.Add(typeof(Foo), true);
RuntimeTypeModel.Default.Add(typeof(Bar), true);
for(int i = 0 ; i < 5 ; i++) {
    RuntimeTypeModel.Default.CompileInPlace();
}
```

Separately, I should probably:

add a method to fully expand a model for the CompileInPlace scenario
spend some time optimizing the meta-programming layer

Final thought: the main difference between Compile and CompileInPlace here will be what happens if you've forgotten to add some types; CompileInPlace works against the existing model, so you can still add new types (implicitly or explicitly) later, and it will "just work"; Compile is more rigid: once you've generated a type via that, it is fixed and can handle only the types it could deduce at the time it was compiled.

来源：https://stackoverflow.com/questions/13735248/why-is-protobuf-so-slow-on-the-1st-call-but-very-fast-inside-loops

标签

protocol-buffers

protobuf-net