Performance issue when serializing multi-dimensional arrays using BinaryFormatter in .NET

流过昼夜 提交于 2019-12-08 13:39:30

问题


I'm using the BinaryFormatter to serialize a fairly simple multi-dimentional array of floats, although I suspect that the problem occurs with any primitive types. My multi-dimensional array contains 10000x16 floats (160k) and serializing on my PC runs at ~8 MB/s (60 second benchmark writing ~500 MB to SSD drive). Code:

        Stopwatch stopwatch = new Stopwatch();

        float[,] data = new float[10000 , 16];  // Two-dimensional array of 160,000 floats.
        // OR
        float[]  data = new float[10000 * 16];  // One-dimensional array of 160,000 floats.

        var formatter = new BinaryFormatter();
        var stream = new FileStream("C:\\Temp\\test_serialization.data", FileMode.Create, FileAccess.Write);

        // Serialize to disk the array 1000 times.
        stopwatch.Reset();
        stopwatch.Start();
        for (int i = 0; i < 1000; i++)
        {
            formatter.Serialize(stream, data);
        }
        stream.Close();
        stopwatch.Stop();

        TimeSpan ts = stopwatch.Elapsed;

        // Format and display the TimeSpan value.
        string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:000}",
            ts.Hours, ts.Minutes, ts.Seconds,
            ts.Milliseconds);
        Console.WriteLine("Runtime " + elapsedTime);
        var info = new FileInfo(stream.Name);
        Console.WriteLine("Speed: {0:0.00} MB/s", info.Length / ts.TotalSeconds / 1024.0 / 1024.0);

Doing the same thing but using a one-dimensional array of 160k floats, the same amount of data is serialized to disk at ~179 MB/s. Over 20x faster! Why does serializing a two-dimensional array using BinaryFormatter perform so poorly? The underlying storage of the of the two arrays in memory should be identical. (I've done unsafe native pin_ptr and copying to and from 2D arrays in C++/CLI).

A hackish solution would be to implement ISerializable and do a memcopy (unsafe/ptr pinning/block memcopy) the 2D array into a 1D array and serialize that and the dimensions. Another option I am considering is a switch to protobuf-net.


回答1:


No need to give up your data structure or copy values, you can use the following code to achieve to same performance:

            fixed (float* ptr = data)
            {
                byte* arr = (byte*)ptr;
                int size = sizeof(float);

                for (int j = 0; j < data.Length * size; j++)
                {
                    stream.WriteByte(arr[j]);
                }
            }

Basically, you are writing the output stream yourself, and like you said, you are just using the float[] as a byte[] since the memory structure is the same.

The deseriazliation is the same, you can use either StreamReader to read floats or unsafe and just load the data into memory.

If you have basic needs like this, I'd strongly discourage using protobuf.net though. The development slowed down and based on one single guy, so it's pretty risky (when I tried to help about a performance issue, he did not even bother to see the changes I offered to make). However, if you want to serialise complex data structures, binary serialisation would not be much slower than protobuf, although the latter one is not officially supported on the .NET platform (Google released the code for it for Java, Python and C++).



来源:https://stackoverflow.com/questions/8059813/performance-issue-when-serializing-multi-dimensional-arrays-using-binaryformatte

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!