问题
I am trying to serialize an object containing a list of very large composite object graphs (~200000 nodes or more) using Protobuf-net. Basically what I want to achieve is to save the complete object into a single file as fast and as compact as possible.
My problem is that I get an out-of-memory-exception while trying to serialize the object. On my machine the exception is thrown when the file size is around 1.5GB. I am running a 64 bit process and using a StreamWriter as input to protobuf-net. Since I am writing directly to a file I suspect that some kind of buffering is taking place within protobuf-net causing the exception. I have tried to use the DataFormat = DataFormat.Group attribute but with no luck so far.
I can avoid the exception by serializing each composite in the list to a separate file but I would prefer to have it all done in one go if possible.
Am I doing something wrong or is it simply not possible to achieve what i want?
Code to illustrate the problem:
class Program
{
static void Main(string[] args)
{
int numberOfTrees = 250;
int nodesPrTree = 200000;
var trees = CreateTrees(numberOfTrees, nodesPrTree);
var forest = new Forest(trees);
using (var writer = new StreamWriter("model.bin"))
{
Serializer.Serialize(writer.BaseStream, forest);
}
Console.ReadLine();
}
private static Tree[] CreateTrees(int numberOfTrees, int nodesPrTree)
{
var trees = new Tree[numberOfTrees];
for (int i = 0; i < numberOfTrees; i++)
{
var root = new Node();
CreateTree(root, nodesPrTree, 0);
var binTree = new Tree(root);
trees[i] = binTree;
}
return trees;
}
private static void CreateTree(INode tree, int nodesPrTree, int currentNumberOfNodes)
{
Queue<INode> q = new Queue<INode>();
q.Enqueue(tree);
while (q.Count > 0 && currentNumberOfNodes < nodesPrTree)
{
var n = q.Dequeue();
n.Left = new Node();
q.Enqueue(n.Left);
currentNumberOfNodes++;
n.Right = new Node();
q.Enqueue(n.Right);
currentNumberOfNodes++;
}
}
}
[ProtoContract]
[ProtoInclude(1, typeof(Node), DataFormat = DataFormat.Group)]
public interface INode
{
[ProtoMember(2, DataFormat = DataFormat.Group, AsReference = true)]
INode Parent { get; set; }
[ProtoMember(3, DataFormat = DataFormat.Group, AsReference = true)]
INode Left { get; set; }
[ProtoMember(4, DataFormat = DataFormat.Group, AsReference = true)]
INode Right { get; set; }
}
[ProtoContract]
public class Node : INode
{
INode m_parent;
INode m_left;
INode m_right;
public INode Left
{
get
{
return m_left;
}
set
{
m_left = value;
m_left.Parent = null;
m_left.Parent = this;
}
}
public INode Right
{
get
{
return m_right;
}
set
{
m_right = value;
m_right.Parent = null;
m_right.Parent = this;
}
}
public INode Parent
{
get
{
return m_parent;
}
set
{
m_parent = value;
}
}
}
[ProtoContract]
public class Tree
{
[ProtoMember(1, DataFormat = DataFormat.Group)]
public readonly INode Root;
public Tree(INode root)
{
Root = root;
}
}
[ProtoContract]
public class Forest
{
[ProtoMember(1, DataFormat = DataFormat.Group)]
public readonly Tree[] Trees;
public Forest(Tree[] trees)
{
Trees = trees;
}
}
Stack-trace when the exception is thrown:
at System.Collections.Generic.Dictionary`2.Resize(Int32 newSize, Boolean forceNewHashCodes)
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at ProtoBuf.NetObjectCache.AddObjectKey(Object value, Boolean& existing) in NetObjectCache.cs:line 154
at ProtoBuf.BclHelpers.WriteNetObject(Object value, ProtoWriter dest, Int32 key, NetObjectOptions options) BclHelpers.cs:line 500
at proto_5(Object , ProtoWriter )
I am trying to do a workaround where I serialize the array of trees one at a time to a single file using the SerializeWithLengthPrefix method. Serialization seems work - I can see the filesize is increased after each tree in the list is added to the file. However, when I try to Deserialize the trees I get the Invalid wire-type exception. I am creating a new file when I serialize the trees so the file should be garbage free - unless I am writing garbage of cause ;-). My serialize and deserialization methods are listed below:
using (var writer = new FileStream("model.bin", FileMode.Create))
{
foreach (var tree in trees)
{
Serializer.SerializeWithLengthPrefix(writer, tree, PrefixStyle.Base128);
}
}
using (var reader = new FileStream("model.bin", FileMode.Open))
{
var trees = Serializer.DeserializeWithLengthPrefix<Tree[]>>(reader, PrefixStyle.Base128);
}
Am I using the method in a incorrect way?
回答1:
It wasn't helping that the AsReference
code was only respecting default data-format, which means it was trying to hold data in memory so that it can write the object-length prefix back into the data-stream, which is exactly what we don't want here (hence your quite correct use of DataFormat.Group
). That will account for buffering for an individual branch of the tree. I've tweaked it locally, and I can definitely confirm that it is now writing forwards-only (the debug build has a convenient ForwardsOnly
flag that I can enable which detects this and shouts).
With that tweak, I have had it work for 250 x 20,000, but I'm getting secondary problems with the dictionary resizing (even in x64) when working on the 250 x 200,000 - like you say, at around the 1.5GB level. It occurs to me, however, that I might be able to discard one of these (forwards or reverse) respectively when doing each of serialization / deserialization. I would be interested in the stack-trace when it breaks for you - if it is ultimately the dictionary resize, I may need to think about moving to a group of dictionaries...
来源:https://stackoverflow.com/questions/15794274/serialize-list-of-huge-composite-graphs-using-protobuf-net-causing-out-of-memory