OutOfMemory Exception with Streams and BsonWriter in Json.Net

寵の児 提交于 2020-01-06 07:25:45

问题


I'm having a problem using Json.net and creating a large Bson file. I have the following test code:

Imports System.IO
Imports Newtonsoft.Json

Public Class Region
    Public Property Id As Integer
    Public Property Name As String
    Public Property FDS_Id As String
End Class

Public Class Regions
    Inherits List(Of Region)

    Public Sub New(capacity As Integer)
        MyBase.New(capacity)
    End Sub
End Class

Module Module1
    Sub Main()
        Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000))
        GC.Collect(0)
    End Sub

    Public Function GetRegionList(count As Integer) As List(Of Region)
        Dim regions As New Regions(count - 1)
        For lp = 0 To count - 1
            regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString})
        Next
        Return regions
    End Function

    Public Function CreateFileBson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Dim lp = 0

        Using stream = New StreamWriter("c:\atlas\regionsStream.bson")
            Using writer = New Bson.BsonWriter(stream.BaseStream)
                writer.WriteStartArray()

                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()

                    lp += 1
                    If lp Mod 1000000 = 0 Then
                        writer.Flush()
                        stream.Flush()
                        stream.BaseStream.Flush()
                    End If
                Next

                writer.WriteEndArray()
            End Using
        End Using

        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function
End Module

I have used FileStream instead of StreamWriter in the first using statement and it makes no difference.

The CreateBsonFile_Stream fails at just over 3m records with an OutOfMemory exception. Using the memory profiler in visual studio shows the memory continuing to climb even though I'm flushing everything I can.

The list of 5m regions comes to about 468Mb in memory.

Interestingly, if I use the following code to produce Json it works and memory statys steady at 500Mb:

Public Function CreateFileJson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Using stream = New StreamWriter("c:\atlas\regionsStream.json")
            Using writer = New JsonTextWriter(stream)
                writer.WriteStartArray()

                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()
                Next

                writer.WriteEndArray()
            End Using
        End Using
        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function

I'm pretty certain this is a problem with the BsonWriter but can't see what else I can do. Any ideas?


回答1:


The reason you are running out of memory is as follows. According to the BSON specification, every object or array - called documents in the standard - must contain at the beginning a count of the total number of bytes comprising the document:

document    ::=     int32 e_list "\x00"     BSON Document. int32 is the total number of bytes comprising the document.
e_list      ::=     element e_list  
    |   ""  
element     ::=     "\x01" e_name double    64-bit binary floating point
    |   "\x02" e_name string    UTF-8 string
    |   "\x03" e_name document  Embedded document
    |   "\x04" e_name document  Array
    |   ...

Thus when writing the root object or array, the total number of bytes to be written to the file must be precalculated.

Newtonsoft's BsonDataWriter and underlying BsonBinaryWriter implement this by caching all tokens to be written in a tree, then when the contents of the root token have been finalized, recursively calculating the sizes before writing the tree out. (The alternatives would have been to make the application (i.e. your code) somehow precalculate this information -- practically impossible -- or to seek back and forth in the output stream to write this information, possibly only for those streams for which Stream.CanSeek == true.) You are getting the OutOfMemory exception because your system has insufficient resources to hold the token tree.

For comparison, the JSON standard does not require byte counts or sizes to be written anywhere in the file. Thus JsonTextWriter can stream your serialized array contents out immediately, without the need to cache anything.

As a workaround, based on the BSON spec and BsonBinaryWriter I have created a helper method that incrementally serializes an enumerable to a stream for which Stream.CanSeek == true. It doesn't require caching the entire BSON document in memory, but rather seeks to the beginning of the stream to write the final byte count:

public static partial class BsonExtensions
{
    const int BufferSize = 256;

    public static void SerializeEnumerable<TItem>(IEnumerable<TItem> enumerable, Stream stream, JsonSerializerSettings settings = null)
    {
        // Created based on https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs
        // And http://bsonspec.org/spec.html
        if (enumerable == null || stream == null)
            throw new ArgumentNullException();
        if (!stream.CanSeek || !stream.CanWrite)
            throw new ArgumentException("!stream.CanSeek || !stream.CanWrite");

        var serializer = JsonSerializer.CreateDefault(settings);
        var contract = serializer.ContractResolver.ResolveContract(typeof(TItem));
        BsonType rootType;
        if (contract is JsonObjectContract || contract is JsonDictionaryContract)
            rootType = BsonType.Object;
        else if (contract is JsonArrayContract)
            rootType = BsonType.Array;
        else
            // Arrays of primitives are not implemented yet.
            throw new JsonSerializationException(string.Format("Item type \"{0}\" not implemented.", typeof(TItem)));

        stream.Flush(); // Just in case.
        var initialPosition = stream.Position;

        var buffer = new byte[BufferSize];

        WriteInt(stream, (int)0, buffer); // CALCULATED SIZE TO BE CALCULATED LATER.

        ulong index = 0;
        foreach (var item in enumerable)
        {
            if (item == null)
            {
                stream.WriteByte(unchecked((byte)BsonType.Null));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
            }
            else
            {
                stream.WriteByte(unchecked((byte)rootType));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
                using (var bsonWriter = new BsonDataWriter(stream) { CloseOutput = false })
                {
                    serializer.Serialize(bsonWriter, item);
                }
            }
            index++;
        }

        stream.WriteByte((byte)0);
        stream.Flush();

        var finalPosition = stream.Position;
        stream.Position = initialPosition;

        var size = checked((int)(finalPosition - initialPosition));
        WriteInt(stream, size, buffer); // CALCULATED SIZE.

        stream.Position = finalPosition;
    }

    private static readonly Encoding Encoding = new UTF8Encoding(false);

    private static void WriteString(Stream stream, string s, byte[] buffer)
    {
        if (s != null)
        {
            if (s.Length < buffer.Length / Encoding.GetMaxByteCount(1))
            {
                var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0);
                stream.Write(buffer, 0, byteCount);
            }
            else
            {
                byte[] bytes = Encoding.GetBytes(s);
                stream.Write(bytes, 0, bytes.Length);
            }
        }

        stream.WriteByte((byte)0);
    }

    private static void WriteInt(Stream stream, int value, byte[] buffer)
    {
        unchecked
        {
            buffer[0] = (byte)value;
            buffer[1] = (byte)(value >> 8);
            buffer[2] = (byte)(value >> 16);
            buffer[3] = (byte)(value >> 24);
        }
        stream.Write(buffer, 0, 4);
    }

    enum BsonType : sbyte
    {
        // Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs
        // And also http://bsonspec.org/spec.html
        Number = 1,
        String = 2,
        Object = 3,
        Array = 4,
        Binary = 5,
        Undefined = 6,
        Oid = 7,
        Boolean = 8,
        Date = 9,
        Null = 10,
        Regex = 11,
        Reference = 12,
        Code = 13,
        Symbol = 14,
        CodeWScope = 15,
        Integer = 16,
        TimeStamp = 17,
        Long = 18,
        MinKey = -1,
        MaxKey = 127
    }
}

And then call it as follows:

BsonExtensions.SerializeEnumerable(regions, stream)

Notes:

  • You could use the method above to serialize to a local FileStream or a MemoryStream -- but not, say, a DeflateStream, which cannot be repositioned.

  • Serializing enumerables of primitives is not implemented, but could be.

  • In Release 10.0.1 Newtonsoft moved BSON processing into a separate nuget Newtonsoft.Json.Bson and replaced BsonWriter with BsonDataWriter. If you are using an earlier version of Newtonsoft the answer above applies equally to the old BsonWriter.

  • Since Json.NET is written in c# and my primary language is c#, the workaround is also in c#. If you need this converted to VB.NET, let me know and I can try.

Demo fiddle with some simple unit tests here.




回答2:


Found it - BsonWriter is trying to be 'intelligent'... because I am producing the json as an array of regions it seems to be keeping the whole array in memory regardless of any flushing that you do.

To prove this I took out the Start and End Array writes and ran the routine - memory usage stayed at 500Mb and the procedure ran properly.

My guess is that this is a bug that got fixed in the JsonWriter but not in the lesser used BsonWriter



来源:https://stackoverflow.com/questions/33451929/outofmemory-exception-with-streams-and-bsonwriter-in-json-net

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!