Protobuf streaming (lazy serialization) API

前端 未结 3 2034
星月不相逢
星月不相逢 2021-02-10 00:57

We have an Android app that uses Protocol Buffers to store application data. The data format (roughly) is a single protobuf (\"container\") that contains a list of protobufs (\"

相关标签:
3条回答
  • 2021-02-10 01:14

    There is no such thing. A protobuf is a packed structure. In order to do this effectively it would need all the data. You will have to add the "streaming protocol" yourself. Maybe send a protobuf msg every N items.

    0 讨论(0)
  • 2021-02-10 01:23

    In the normal java version of Protocol buffers there is Delimited files where you write Protocol-Buffers one at a time. I am not sure if it is in the Android version

     aLocation.writeDelimitedTo(out);
    

    As Marc has indicated it easily implemented; just write a length followed the serialised bytes. In normal (non android) java version of prortocol-buffers you can also do (you have to serialise to a byte array or something similar)

    private CodedOutputStream codedStream = null;
    
    
    public void write(byte[] bytes) throws IOException {
        if (bytes != ConstClass.EMPTY_BYTE_ARRAY) {
            codedStream.writeRawVarint32(bytes.length);
            codedStream.writeRawBytes(bytes);
            codedStream.flush();
        }
    }
    

    and

        private CodedInputStream coded;
    
    public byte[] read() throws IOException {
        if (coded == null) {
            throw new IOException("Reader has not been opened !!!");
        }
        if (coded.isAtEnd()) {
            return null;
        }
        return coded.readBytes().toByteArray();
    

    Something may be possible in other Protocol-Buffers versions

    0 讨论(0)
  • 2021-02-10 01:27

    For serialization:

    protobuf is an appendable format, with individual items being merged, and repeated items being appended

    Therefore, to write a sequence as a lazy stream, all you need to do is repeatedly write the same structure with only one item in the list: serializing a sequence of 200 x "Container with 1 Item" is 100% identical to serializing 1 x "Container with 200 Items".

    So: just do that!


    For deserialization:

    That is technically very easy to read as a stream - it all, however, comes down to which library you are using. For example, I expose this in protobuf-net (a .NET / C# implementation) as Serializer.DeserializeItems<T>, which reads (fully lazy/streaming) a sequence of messages of type T, based on the assumption that they are in the form you describe in the question (so Serializer.DeserializeItems<Item> would be the streaming way that replaces Serializer.Deserialize<Container> - the outermost object kinda doesn't really exist in protobuf)

    If this isn't available, but you have access to a raw reader API, what you need to do is:

    • read one varint for the header - this will be the value 10 (0x0A), i.e. "(1 << 3) | 2" for the field-number (1) and wire-type (2) respectively - so this could also be phrased: "read a single byte from the stream , and check the value is 10"
    • read one varint for the length of the following item
    • now:
      • if the reader API allows you to restrict the maximum number of bytes to process, use this length to specify the length that follows
      • or wrap the stream API with a length-limiting stream, limited to that length
      • or just manually read that many bytes, and construct an in-memory stream from the payload
    • rinse, repeat
    0 讨论(0)
提交回复
热议问题