C# - OutOfMemoryException saving a List on a JSON file

问题

I'm trying to save the streaming data of a pressure map. Basically I have a pressure matrix defined as:

double[,] pressureMatrix = new double[e.Data.GetLength(0), e.Data.GetLength(1)];

Basically, I'm getting one of this pressureMatrix every 10 milliseconds and I want to save all the information in a JSON file to be able to reproduce it later.

What I do is, first of all, write what I call the header with all the settings used to do the recording like this:

recordedData.softwareVersion = Assembly.GetExecutingAssembly().GetName().Version.Major.ToString() + "." + Assembly.GetExecutingAssembly().GetName().Version.Minor.ToString();
recordedData.calibrationConfiguration = calibrationConfiguration;
recordedData.representationConfiguration = representationSettings;
recordedData.pressureData = new List<PressureMap>();

var json = JsonConvert.SerializeObject(csvRecordedData, Formatting.None);

File.WriteAllText(this.filePath, json);

Then, every time I get a new pressure map I create a new Thread to add the new PressureMatrix and re-write the file:

var newPressureMatrix = new PressureMap(datos, DateTime.Now);
recordedData.pressureData.Add(newPressureMatrix);
var json = JsonConvert.SerializeObject(recordedData, Formatting.None);
File.WriteAllText(this.filePath, json);

After about 20-30 min I get an OutOfMemory Exception because the system cannot hold the recordedData var because the List<PressureMatrix> in it is too big.

How can I handle this to save a the data? I would like to save the information of 24-48 hours.

回答1:

Your basic problem is that you are holding all of your pressure map samples in memory rather than writing each one individually and then allowing it to be garbage collected. What's worse, you are doing this in two different places:

You serialize your entire list of samples to a JSON string json before writing the string to a file.

Instead, as explained in Performance Tips: Optimize Memory Usage, you should serialize and deserialize directly to and from your file in such situations. For instructions on how to do this see this answer to Can Json.NET serialize / deserialize to / from a stream? and also Serialize JSON to a file.
The recordedData.pressureData = new List<PressureMap>(); accumulates all pressure map samples, then writes all of them every time a sample is made.

A better solution would be to write each sample once and forget it, but the requirement for each sample to be nested inside some container objects in the JSON makes it nonobvious how to do that.

So, how to attack issue #2?

First, let's modify your data model as follows, partitioning the header data into a separate class:

public class PressureMap
{
    public double[,] PressureMatrix { get; set; }
}

public class CalibrationConfiguration 
{
    // Data model not included in question
}

public class RepresentationConfiguration 
{
    // Data model not included in question
}

public class RecordedDataHeader
{
    public string SoftwareVersion { get; set; }
    public CalibrationConfiguration CalibrationConfiguration { get; set; }
    public RepresentationConfiguration RepresentationConfiguration { get; set; }
}

public class RecordedData
{
    // Ensure the header is serialized first.
    [JsonProperty(Order = 1)]
    public RecordedDataHeader RecordedDataHeader { get; set; }
    // Ensure the pressure data is serialized last.
    [JsonProperty(Order = 2)]
    public IEnumerable<PressureMap> PressureData { get; set; }
}

Option #1 is a version of the producer-comsumer pattern. It involves spinning up two threads: one to generate PressureData samples, and one to serialize the RecordedData. The first thread will generate samples and add them to a BlockingCollection<PressureMap> collection that is passed to the second thread. The second thread will then serialize BlockingCollection<PressureMap>.GetConsumingEnumerable() as the value of RecordedData.PressureData.

The following code gives a skeleton for how to do this:

var sampleCount = 400;    // Or whatever stopping criterion you prefer
var sampleInterval = 10;  // in ms

using (var pressureData = new BlockingCollection<PressureMap>())
{
    // Adapted from
    // https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
    // https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2

    // Spin up a Task to sample the pressure maps
    using (Task t1 = Task.Factory.StartNew(() =>
    {
        for (int i = 0; i < sampleCount; i++)
        {
            var data = GetPressureMap(i);
            Console.WriteLine("Generated sample {0}", i);
            pressureData.Add(data);
            System.Threading.Thread.Sleep(sampleInterval);
        }
        pressureData.CompleteAdding();
    }))
    {
        // Spin up a Task to consume the BlockingCollection
        using (Task t2 = Task.Factory.StartNew(() =>
        {
            var recordedDataHeader = new RecordedDataHeader
            {
                SoftwareVersion = softwareVersion,
                CalibrationConfiguration = calibrationConfiguration,
                RepresentationConfiguration = representationConfiguration,
            };

            var settings = new JsonSerializerSettings
            {
                ContractResolver = new CamelCasePropertyNamesContractResolver(),
            };

            using (var stream = new FileStream(this.filePath, FileMode.Create))
            using (var textWriter = new StreamWriter(stream))
            using (var jsonWriter = new JsonTextWriter(textWriter))
            {
                int j = 0;

                var query = pressureData
                    .GetConsumingEnumerable()
                    .Select(p => 
                            { 
                                // Flush the writer periodically in case the process terminates abnormally
                                jsonWriter.Flush();
                                Console.WriteLine("Serializing item {0}", j++);
                                return p;
                            });

                var recordedData = new RecordedData
                {
                    RecordedDataHeader = recordedDataHeader,
                    // Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
                    PressureData = query,
                };                          

                Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
                JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
                Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
            }
        }))
        {
            Task.WaitAll(t1, t2);
        }
    }
}

Notes:

This solution uses the fact that, when serializing an IEnumerable<T>, Json.NET will not materialize the enumerable as a list. Instead it will take full advantage of lazy evaluation and simply enumerate through it, writing then forgetting each individual item encountered.
The first thread samples PressureData and adds them to the blocking collection.
The second thread wraps the blocking collection in an IEnumerable<PressureData> then serializes that as RecordedData.PressureData.

During serialization, the serializer will enumerate through the IEnumerable<PressureData> enumerable, streaming each to the JSON file then proceeding to the next -- effectively blocking until one becomes available.
You will need to do some experimentation to make sure that the serialization thread can "keep up" with the sampling thread, possibly by setting a BoundedCapacity during construction. If not, you may need to adopt a different strategy.
PressureMap GetPressureMap(int count) should be some method of yours (not shown in the question) that returns the current pressure map sample.
In this technique the JSON file remains open for the duration of the sampling session. If sampling terminates abnormally the file may be truncated. I make some attempt to ameliorate the problem by flushing the writer periodically.
While data serialization will no longer require unbounded amounts of memory, deserializing a RecordedData later will deserialize the PressureData array into a concrete List<PressureMap>. This may possibly cause memory issues during downstream processing.

Demo fiddle #1 here.

Option #2 would be to switch from a JSON file to a Newline Delimited JSON file. Such a file consists of sequences of JSON objects separated by newline characters. In your case, you would make the first object contain the RecordedDataHeader information, and the subsequent objects be of type PressureMap:

var sampleCount = 100; // Or whatever
var sampleInterval = 10;

var recordedDataHeader = new RecordedDataHeader
{
    SoftwareVersion = softwareVersion,
    CalibrationConfiguration = calibrationConfiguration,
    RepresentationConfiguration = representationConfiguration,
};

var settings = new JsonSerializerSettings
{
    ContractResolver = new CamelCasePropertyNamesContractResolver(),
};

// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);

using (var stream = new FileStream(this.filePath, FileMode.Create))
{
    JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}

// Write each sample incrementally

for (int i = 0; i < sampleCount; i++)
{
    Thread.Sleep(sampleInterval);
    Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
    var map = GetPressureMap(i);

    using (var stream = new FileStream(this.filePath, FileMode.Append))
    {
        JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
    }
}

Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);

Using the extension methods:

public static partial class JsonExtensions
{
    // Adapted from the answer to
    // https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
    // by dbc https://stackoverflow.com/users/3744182/dbc
    public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
    {
        // Let caller dispose the underlying stream 
        using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
        {
            ToNewlineDelimitedJson(textWriter, items);
        }
    }

    public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
    {
        var serializer = JsonSerializer.CreateDefault();

        foreach (var item in items)
        {
            // Formatting.None is the default; I set it here for clarity.
            using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
            {
                serializer.Serialize(writer, item);
            }
            // http://specs.okfnlabs.org/ndjson/
            // Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A). 
            // The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
            textWriter.Write("\n");
        }
    }

    // Adapted from the answer to 
    // https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
    // by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
    public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
        where THeader : TBase
        where TRow : TBase
    {
        bool first = true;

        using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
        {
            var serializer = JsonSerializer.CreateDefault();

            while (jsonReader.Read())
            {
                if (jsonReader.TokenType == JsonToken.Comment)
                    continue;
                if (first)
                {
                    yield return serializer.Deserialize<THeader>(jsonReader);
                    first = false;
                }
                else
                {
                    yield return serializer.Deserialize<TRow>(jsonReader);
                }
            }
        }
    }
}

Later, you can process the newline delimited JSON file as follows:

using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
    foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
    {
        if (obj is RecordedDataHeader)
        {
            var header = (RecordedDataHeader)obj;
            // Process the header
            Console.WriteLine(JsonConvert.SerializeObject(header));
        }
        else
        {
            var row = (PressureMap)obj;
            // Process the row.
            Console.WriteLine(JsonConvert.SerializeObject(row));
        }
    }
}

Notes:

This approach looks simpler because the samples are added incrementally to the end of the file, rather than inserted inside some overall JSON container.
With this approach both serialization and downstream processing can be done with bounded memory use.
The sample file does not remain open for the duration of sampling, so is less likely to be truncated.
Downstream applications may not have built-in tools for processing newline delimited JSON.
This strategy may integrate more simply with your current threading code.

Demo fiddle #2 here.

来源：https://stackoverflow.com/questions/52311362/c-sharp-outofmemoryexception-saving-a-list-on-a-json-file

标签

.net

json

multithreading

jsonconvert