I\'ve known that GetBuffer()
on a MemoryStream in C#/.NET has to be used with care, because, as the docs describe here, there can be unused bytes at the end, so
.NET 4.6 has a new API, bool MemoryStream.TryGetBuffer(out ArraySegment<byte> buffer) that is similar in spirit to .GetBuffer()
. This method will return an ArraySegment that includes the _origin
information if it can.
See this question for details about when .TryGetBuffer()
will return true and populate the out param with useful information.
It can be useful if you're using a low level API that takes an ArraySegment
, such as Socket.Send. Rather than call ToArray
which will create another copy of the array you can create a segment:
var segment=new ArraySegment<byte>(stream.GetBuffer(), 0, stream.Position);
and then pass that to the Send
method. For large data this will avoid allocating a new array and copying into it, which could be expensive.
ToArray() is the alternative of GetBuffer(). However ToArray() makes a copy of the object in the memory. If the bytes are more than 80000 the object will be placed in the Large Object Heap (LOH). So far nothing fancy. However the GC does not handle very well the LOH and the objects in it (the memory is not freed as you expect). Because of this OutOfMemoryException can occur. The solution is to either call GC.Collect() so that those objects get collected or to use GetBuffer() and create several smaller (less than 80000 bytes) objects - those will not go to the LOH and the memory will be freed as expected by the GC.
A third (better) option exists and that is to use only streams, e.g. read all the bytes from a MemoryStream and directly write them to HttpResponse.OutputStream (using again byte array < 80000 bytes as a buffer). However this is not always possible (as it was in my case).
As a summary we can say that when a in-memory copy of the object is not desired you will have to avoid ToArray() and in those cases GetBuffer() might come in handy, but might not be the best solution.
The most important point from the GetBuffer
MSDN documentation, other than it not creating a copy of the data, is that it returns an array that has unused bytes:
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.
So if you really want to avoid creating a copy due to memory constraints, you have to be careful to not send the whole array from GetBuffer
over the wire or dumping it to a file or attachment, because that buffer grows by powers of 2 whenever it is filled and almost always has a lot of unused bytes at the end.
If you really want to access the internal _origin Value, you may use the MemoryStream.Seek(0, SeekOrigin.Begin) call. The return Value will be exactly the _origin Value.
The answer is in the GetBuffer() MSDN doc, you might have missed it.
When you create a MemoryStream
without providing a byte array (byte[]
) :
it creates an expandable capacity initialized to zero.
In other words, the MemoryStream will reference to a byte[]
with the proper size when a Write
call will be made on the Stream.
Thus, with GetBuffer()
you can directly access the underlying array and read to it.
This could be useful when you're in the situation that you will receive a stream without knowing its size. If the stream received is usually very big, it will be much faster to call GetBuffer()
than calling ToArray()
which copy the data under the hood, see below.
To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.
I wonder at which point you might have called GetBuffer() to get junk data at the beginning, it could be between two Write
calls where the data from the first one would have been garbage collected, but I'm not sure if that could happen.