Memory usage serializing chunked byte arrays with Protobuf-net

前端 未结 2 1222
心在旅途
心在旅途 2021-01-12 00:52

In our application we have some data structures which amongst other things contain a chunked list of bytes (currently exposed as a List). We chun

2条回答
  •  醉梦人生
    2021-01-12 01:24

    I'm going to read between some lines here... because List (mapped as repeated in protobuf parlance) doesn't have an overall length-prefix, and byte[] (mapped as bytes) has a trivial length-prefix that shouldn't cause additional buffering. So I'm guessing what you actually have is more like:

    [ProtoContract]
    public class A {
        [ProtoMember(1)]
        public B Foo {get;set;}
    }
    [ProtoContract]
    public class B {
        [ProtoMember(1)]
        public List Bar {get;set;}
    }
    

    Here, the need to buffer for a length-prefix is actually when writing A.Foo, basically to declare "the following complex data is the value for A.Foo"). Fortunately there is a simple fix:

    [ProtoMember(1, DataFormat=DataFormat.Group)]
    public B Foo {get;set;}
    

    This changes between 2 packing techniques in protobuf:

    • the default (google's stated preference) is length-prefixed, meaning you get a marker indicating the length of the message to follow, then the sub-message payload
    • but there is also an option to use a start-marker, the sub-message payload, and an end-marker

    When using the second technique it doesn't need to buffer, so: it doesn't. This does mean it will be writing slightly different bytes for the same data, but protobuf-net is very forgiving, and will happily deserialize data from either format here. Meaning: if you make this change, you can still read your existing data, but new data will use the start/end-marker technique.

    This demands the question: why do google prefer the length-prefix approach? Probably this is because it is more efficient when reading to skip through fields (either via a raw reader API, or as unwanted/unexpected data) when using the length-prefix approach, as you can just read the length-prefix, and then just progress the stream [n] bytes; by contrast, to skip data with a start/end-marker you still need to crawl through the payload, skipping the sub-fields individually. Of course, this theoretical difference in read performance doesn't apply if you expect that data and want to read it into your object, which you almost certainly do. Also, in the google protobuf implementation, because it isn't working with a regular POCO model, the size of the payloads are already known, so they don't really see the same issue when writing.

提交回复
热议问题