How to serialize/deserialize large list of items with protobuf-net

前端 未结 3 1119
心在旅途
心在旅途 2021-02-14 13:17

I have a list of about 500 million items. I am able to serialize this into a file with protobuf-net file if I serialize individual items, not a list -- I cannot collect the item

相关标签:
3条回答
  • 2021-02-14 13:44

    The API apprently has changed since Marc's answer.
    It seems there's no SerializeItems method any more.

    Here's some more up to date info that should help:

    ProtoBuf.Serializer.Serialize(stream, items);
    

    can take an IEnumerable as seen above and it does the job when it comes to serialization.
    However there's a DeserializeItems(...) method and the devil is in the details :)
    If you serialize IEnumerable like above, then you need to call DeserializeItems passing PrefixStyle.Base128 and 1 as fieldNumber cause apprently those are the defaults.
    Here's an example:

    ProtoBuf.Serializer.DeserializeItems<T>(stream, ProtoBuf.PrefixStyle.Base128, 1));
    

    Also as pointed out by Marc and Vic you can serialize/deserialize on a per item basis like this (using custom values for PrefixStyle and fieldNumber):

    ProtoBuf.Serializer.SerializeWithLengthPrefix(stream, item, ProtoBuf.PrefixStyle.Base128, fieldNumber: 1);
    

    and

    T item;
    while ((item = ProtoBuf.Serializer.DeserializeWithLengthPrefix<T>(stream, ProtoBuf.PrefixStyle.Base128, fieldNumber: 1)) != null)
    {
        // do stuff here
    }
    
    0 讨论(0)
  • 2021-02-14 13:53

    May be I am too late on this... but just to add to what Marc already said.

    As you use Serializer.Serialize(output, price); protobuf treat consecutive messages as part of a (same)single object. So when you use Deserialize using

    while ((price = Serializer.Deserialize<Price>(input)) != null)
    

    you will get all the records back. Hence you will see only the last Price record.

    To do what you want to do, change the serialization code to:

    Serializer.SerializeWithLengthPrefix(output, price, PrefixStyle.Base128, 1);
    

    and

    while ((price = Serializer.DeserializeWithLengthPrefix<Price>(input, PrefixStyle.Base128, 1)) != null)
    
    0 讨论(0)
  • 2021-02-14 13:53

    Good news! The protobuf-net API is setup for exactly this scenario. You should see a SerializeItems and DeserializeItems pair of methods that work with IEnumerable<T>, allowing streaming both in and out. The easiest way to do feed it an enumerate is via an "iterator block" over the source data.

    If, for whatever reason, that isn't convenient, that is 100% identical to using SerializeWithLengthPrefix and DeserializeWithLengthPrefix on a per-item basis, specifying (as parameters) field: 1 and prefix-style: base-128. You could even use SerializeWithLengthPrefix for the writing, and DeserializeItems for the reading (as long as you use field 1 and base-128).

    Re the example - id have to see that in a fully reproducible scenario to comment; actually, what I would expect there is that you only get a single object back out, containing the combined values from each object - because without the length-prefix, the protobuf spec assumes you are just concatenating values to a single object. The two approaches mentioned above avoid this issue.

    0 讨论(0)
提交回复
热议问题