In our application we have some data structures which amongst other things contain a chunked list of bytes (currently exposed as a List
). We chun
I'm going to read between some lines here... because List
(mapped as repeated
in protobuf parlance) doesn't have an overall length-prefix, and byte[]
(mapped as bytes
) has a trivial length-prefix that shouldn't cause additional buffering. So I'm guessing what you actually have is more like:
[ProtoContract]
public class A {
[ProtoMember(1)]
public B Foo {get;set;}
}
[ProtoContract]
public class B {
[ProtoMember(1)]
public List Bar {get;set;}
}
Here, the need to buffer for a length-prefix is actually when writing A.Foo
, basically to declare "the following complex data is the value for A.Foo
"). Fortunately there is a simple fix:
[ProtoMember(1, DataFormat=DataFormat.Group)]
public B Foo {get;set;}
This changes between 2 packing techniques in protobuf:
When using the second technique it doesn't need to buffer, so: it doesn't. This does mean it will be writing slightly different bytes for the same data, but protobuf-net is very forgiving, and will happily deserialize data from either format here. Meaning: if you make this change, you can still read your existing data, but new data will use the start/end-marker technique.
This demands the question: why do google prefer the length-prefix approach? Probably this is because it is more efficient when reading to skip through fields (either via a raw reader API, or as unwanted/unexpected data) when using the length-prefix approach, as you can just read the length-prefix, and then just progress the stream [n] bytes; by contrast, to skip data with a start/end-marker you still need to crawl through the payload, skipping the sub-fields individually. Of course, this theoretical difference in read performance doesn't apply if you expect that data and want to read it into your object, which you almost certainly do. Also, in the google protobuf implementation, because it isn't working with a regular POCO model, the size of the payloads are already known, so they don't really see the same issue when writing.