Protobuf-net lazy streaming deserialization of fields

瘦欲@ 提交于 2020-01-02 08:13:13

问题


Overall aim: To skip a very long field when deserializing, and when the field is accessed to read elements from it directly from the stream without loading the whole field.

Example classes The object being serialized/deserialized is FatPropertyClass.

[ProtoContract]
public class FatPropertyClass
{
    [ProtoMember(1)]
    private int smallProperty;

    [ProtoMember(2)]
    private FatArray2<int> fatProperty;

    [ProtoMember(3)]
    private int[] array;

    public FatPropertyClass()
    {

    }

    public FatPropertyClass(int sp, int[] fp)
    {
        smallProperty = sp;
        fatProperty = new FatArray<int>(fp);
    }

    public int SmallProperty
    {
        get { return smallProperty; }
        set { smallProperty = value; }
    }

    public FatArray<int> FatProperty
    {
        get { return fatProperty; }
        set { fatProperty = value; }
    }

    public int[] Array
    {
        get { return array; }
        set { array = value; }
    }
}


[ProtoContract]
public class FatArray2<T>
{
    [ProtoMember(1, DataFormat = DataFormat.FixedSize)]
    private T[] array;
    private Stream sourceStream;
    private long position;

    public FatArray2()
    {
    }

    public FatArray2(T[] array)
    {
        this.array = new T[array.Length];
        Array.Copy(array, this.array, array.Length);
    }


    [ProtoBeforeDeserialization]
    private void BeforeDeserialize(SerializationContext context)
    {
        position = ((Stream)context.Context).Position;
    }

    public T this[int index]
    {
        get
        {
            // logic to get the relevant index from the stream.
            return default(T);
        }
        set
        {
            // only relevant when full array is available for example.
        }
    }
}

I can deserialize like so: FatPropertyClass d = model.Deserialize(fileStream, null, typeof(FatPropertyClass), new SerializationContext() {Context = fileStream}) as FatPropertyClass; where the model can be for example:

    RuntimeTypeModel model = RuntimeTypeModel.Create();
    MetaType mt = model.Add(typeof(FatPropertyClass), false);
    mt.AddField(1, "smallProperty");
    mt.AddField(2, "fatProperty");
    mt.AddField(3, "array");
    MetaType mtFat = model.Add(typeof(FatArray<int>), false);

This will skip the deserialization of array in FatArray<T>. However, I then need to read random elements from that array at a later time. One thing I tried is to remember the stream position before deserialization in the BeforeDeserialize(SerializationContext context) method of FatArray2<T>. As in the above code: position = ((Stream)context.Context).Position;. However this seems to always be the end of the stream.

How can I remember the stream position where FatProperty2 begins and how can I read from it at a random index?

Note: The parameter T in FatArray2<T> can be of other types marked with [ProtoContract], not just primitives. Also there could be multiple properties of type FatProperty2<T> at various depths in the object graph.

Method 2: Serialize the field FatProperty2<T> after the serialization of the containing object. So, serialize FatPropertyClass with length prefix, then serialize with length prefix all fat arrays it contains. Mark all of these fat array properties with an attribute, and at deserialization we can remember the stream position for each of them.

Then the question is how do we read primitives out of it? This works OK for classes using T item = Serializer.DeserializeItems<T>(sourceStream, PrefixStyle.Base128, Serializer.ListItemTag).Skip(index).Take(1).ToArray(); to get the item at index index. But how does this work for primitives? An array of primitives does not seem to be able to be deserialized using DeserializeItems.

Is DeserializeItems with LINQ used like that even OK? Does it do what I assume it does (internally skip through the stream to the correct element - at worst reading each length prefix and skipping it)?

Regards, Iulian


回答1:


This question depends an awful lot on the actual model - it isn't a scenario that the library specifically targets to make convenient. I suspect that your best bet here would be to write the reader manually using ProtoReader. Note that there are some tricks when it comes to reading selected items if the outermost object is a List<SomeType> or similar, but internal objects are typically either simply read or skipped.

By starting again from the root of the document via ProtoReader, you could seek fairly efficiently to the nth item. I can do a concrete example later if you like (I haven't leapt in unless you're sure it will actually be useful). For reference, the reason the stream's position isn't useful here is: the library aggressively over-reads and buffers data, unless you specifically tell it to limit its length. This is because data like "varint" is hard to read efficiently without lots of buffering, as it would end up being a lot of individual calls to ReadByte(), rather than just working with a local buffer.


This is a completely untested version of reading the n-th array item of the sub-property directly from a reader; note that it would be inefficient to call this lots of times one after the other, but it should be obvious how to change it to read a range of consecutive values, etc:

static int? ReadNthArrayItem(Stream source, int index, int maxLen)
{
    using (var reader = new ProtoReader(source, null, null, maxLen))
    {
        int field, count = 0;
        while ((field = reader.ReadFieldHeader()) > 0)
        {
            switch (field)
            {
                case 2: // fat property; a sub object
                    var tok = ProtoReader.StartSubItem(reader);
                    while ((field = reader.ReadFieldHeader()) > 0)
                    {
                        switch (field)
                        {
                            case 1: // the array field
                                if(count++ == index)
                                    return reader.ReadInt32();
                                reader.SkipField();
                                break;
                            default:
                                reader.SkipField();
                                break;
                        }
                    }
                    ProtoReader.EndSubItem(tok, reader);
                    break;
                default:
                    reader.SkipField();
                    break;
            }
        }
    }
    return null;
}

Finally, note that if this is a large array, you might want to use "packed" arrays (see the protobuf documentation, but this basically stores them without the header per-item). This would be a lot more efficient, but note that it requires slightly different reading code. You enable packed arrays by adding IsPacked = true onto the [ProtoMember(...)] for that array.



来源:https://stackoverflow.com/questions/25951775/protobuf-net-lazy-streaming-deserialization-of-fields

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!