When is GetBuffer() on MemoryStream ever useful?

问题

I've known that GetBuffer() on a MemoryStream in C#/.NET has to be used with care, because, as the docs describe here, there can be unused bytes at the end, so you have to be sure to look only at the first MemoryStream.Length bytes in the buffer.

But then I ran into a case yesterday where bytes at the beginning of the buffer were junk! Indeed, if you use a tool like reflector and look at ToArray(), you can see this:

public virtual byte[] ToArray()
{
    byte[] dst = new byte[this._length - this._origin];
    Buffer.InternalBlockCopy(this._buffer, this._origin, dst, 0,
        this._length - this._origin);
    return dst;
}

So to do anything with the buffer returned by GetBuffer(), you really need to know _origin. The only problem is that _origin is private and there's no way to get at it...

So my question is - what use is GetBuffer() on a MemoryStream() without some apriori knowledge of how the MemoryStream was constructed (which is what sets _origin)?

(It is this constructor, and only this constructor, that sets origin - for when you want a MemoryStream around a byte array starting at a particular index in the byte array:

public MemoryStream(byte[] buffer, int index, int count, bool writable, bool publiclyVisible)

)

回答1:

If you really want to access the internal _origin Value, you may use the MemoryStream.Seek(0, SeekOrigin.Begin) call. The return Value will be exactly the _origin Value.

回答2:

The answer is in the GetBuffer() MSDN doc, you might have missed it.

When you create a MemoryStream without providing a byte array (byte[]) :

it creates an expandable capacity initialized to zero.

In other words, the MemoryStream will reference to a byte[] with the proper size when a Write call will be made on the Stream.

Thus, with GetBuffer() you can directly access the underlying array and read to it.

This could be useful when you're in the situation that you will receive a stream without knowing its size. If the stream received is usually very big, it will be much faster to call GetBuffer() than calling ToArray() which copy the data under the hood, see below.

To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

I wonder at which point you might have called GetBuffer() to get junk data at the beginning, it could be between two Write calls where the data from the first one would have been garbage collected, but I'm not sure if that could happen.

回答3:

ToArray() is the alternative of GetBuffer(). However ToArray() makes a copy of the object in the memory. If the bytes are more than 80000 the object will be placed in the Large Object Heap (LOH). So far nothing fancy. However the GC does not handle very well the LOH and the objects in it (the memory is not freed as you expect). Because of this OutOfMemoryException can occur. The solution is to either call GC.Collect() so that those objects get collected or to use GetBuffer() and create several smaller (less than 80000 bytes) objects - those will not go to the LOH and the memory will be freed as expected by the GC.

A third (better) option exists and that is to use only streams, e.g. read all the bytes from a MemoryStream and directly write them to HttpResponse.OutputStream (using again byte array < 80000 bytes as a buffer). However this is not always possible (as it was in my case).

As a summary we can say that when a in-memory copy of the object is not desired you will have to avoid ToArray() and in those cases GetBuffer() might come in handy, but might not be the best solution.

回答4:

.NET 4.6 has a new API, bool MemoryStream.TryGetBuffer(out ArraySegment<byte> buffer) that is similar in spirit to .GetBuffer(). This method will return an ArraySegment that includes the _origin information if it can.

See this question for details about when .TryGetBuffer() will return true and populate the out param with useful information.

回答5:

It can be useful if you're using a low level API that takes an ArraySegment, such as Socket.Send. Rather than call ToArray which will create another copy of the array you can create a segment:

var segment=new ArraySegment<byte>(stream.GetBuffer(), 0, stream.Position);

and then pass that to the Send method. For large data this will avoid allocating a new array and copying into it, which could be expensive.

回答6:

GetBuffer() always assumes you know the structure of the data fed into the string (and that's its use). If you want to get data out of the stream, you should always use one of the provided methods (e.g. ToArray()).

Something like this can be used, but only case I could think of right now would be some fixed structure or virtual file system sitting in the stream. For example, at your current position you're reading an offset for a file sitting inside the stream. You then create a new stream object based on this stream's buffer but with the different _origin. This saves you from copying the whole data for the new object, which might enable you to save lots of memory. This saves you from carrying the initial buffer as a reference with you, because you're always able to retrieve it once again.

回答7:

The most important point from the GetBuffer MSDN documentation, other than it not creating a copy of the data, is that it returns an array that has unused bytes:

Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

So if you really want to avoid creating a copy due to memory constraints, you have to be careful to not send the whole array from GetBuffer over the wire or dumping it to a file or attachment, because that buffer grows by powers of 2 whenever it is filled and almost always has a lot of unused bytes at the end.

回答8:

GetBuffer() is extremely useful if you need to write Binary data or Binary files.

For example, say I am reading data from DB & then want to write some of that into Binary files, you can write data into buffer first while iterating & then write whole buffer to file in single shot avoiding so many I/O cycles against writing data direct in files during each iteration.

Let me give you an example in C#:

String query = "Select Id, Name from table1";
SqlCommand cmd = new SqlCommand(query, con);
SqlDataAdapter da = new SqlDataAdapter(cmd);
DataTable dt = new DataTable();
da.Fill(dt);

MemoryStream memStream = new MemoryStream(10 * 1024 * 1024) // 10 Mb of membuffer
BinaryWriter memWtr = new BinaryWriter(memStream);

BinaryWriter wtrFile = new BinaryWriter(filePath);// file where you want to write your data eventually

  Foreach (DataRow dr in dt.Rows)
  {
     memWtr.write((int)dr[0]);
     memWtr.write(dr[0].ToString());
  }

//now write whole buffer into File in single call

  wtrFile.write(memStream.GetBuffer(), 0 , memStream.Position);

Hope it helps.

来源：https://stackoverflow.com/questions/13053739/when-is-getbuffer-on-memorystream-ever-useful

标签

.net

memorystream

getbuffer