Will a large System.IO.MemoryStream result in my application's memory usage increasing dramatically?

问题

I am building a library that allows a user to download files from a URL. One of the options I am considering is letting the user specify the expected MD5 checksum for the file; the library's GetFile(string url) function ensures that the checksum for the downloaded stream matches the one specified by the user.

Being aware that the NetworkStream returned by HttpWebResponse.GetResponseStream() is not seekable, I found a way to duplicate the Stream thanks to the answers to this question: How can I read an Http response stream twice in C#?. Before I went any farther though, I wanted to figure out what the memory implications of this duplication would be; unfortunately, multiple searches on Google and MSDN have came to naught.

The library imposes no restriction on the size of the file to be downloaded. My question is, if the user selects a 2GB file, is the MemoryStream implementation in .NET 2.0 smart enough to use the PageFile and RAM efficiently enough that the system doesn't start to crawl due to a VM crunch? Also, Jon Skeet's comment on another question gave me something to think about - he averred that even after disposing a MemoryStream, the memory is not 100% free'ed. How and when can I ensure that the memory is actually released? Will it be released based on the system's requirements (and necessity)?

Thanks, Manoj

回答1:

You're saving it to a file, right? Why not save it chunk by chunk, updating a hash as you go, and then just check the hash at the end? I don't think you need to read the response twice, nor buffer it. As another answer points out, that would fail when you got over 1GB anyway.

Don't forget that as well as the current size of the MemoryStream, any time it has to grow you'll end up with (temporarily) the new array plus the old array in memory at the same time. Of course that wouldn't be a problem if you knew the content length beforehand, but it would still be nicer to just write it to disk and hash as you go.

回答2:

MemoryStream is backed by an array. Even if you have a 64 bit OS this isn't going to work for more than 1GB as the framework won't allocate a larger array.

回答3:

Afaik the CLR managed heap will not allocate anything bigger than 2 GB and the MemoryStream is backed by a live, contigous, byte[]. Large Object Heap doesn't allocations handle over 2GB, not even on x64.

But to store an entire file in memory just to compute a hash seems pretty low tech. You can compute the hash as you receive the bytes, chunk by chunk. After each IO completion you can hash the received bytes, then submit the write to the file. At the end, you have the hash computed and the file uploaded, huraay.

BTW, If you seek code to manipulate files, steer clear of any sample that contains the words ReadToEnd...

class Program
    {
        private static AutoResetEvent done = new AutoResetEvent(false);
        private static AsyncCallback _callbackReadStream;
        private static AsyncCallback _callbackWriteFile;

        static void Main(string[] args)
        {

        try
        {
            _callbackReadStream = new AsyncCallback(CallbackReadStream);
            _callbackWriteFile = new AsyncCallback(CallbackWriteFile);
            string url = "http://...";
            WebRequest request = WebRequest.Create(url);
            request.Method = "GET";
            request.BeginGetResponse(new AsyncCallback(
                CallbackGetResponse), request);
            done.WaitOne();
        }
        catch (Exception e)
        {
            Console.Error.WriteLine(e.Message);
        }
    }

    private class State
    {
        public Stream ReponseStream { get; set; }
        public HashAlgorithm Hash { get; set; }
        public Stream FileStream { get; set; }
        private byte[] _buffer = new byte[16379];
        public byte[] Buffer { get { return _buffer; } }
        public int ReadBytes { get; set; }
        public long FileLength {get;set;}
    }

    static void CallbackGetResponse(IAsyncResult ar)
    {
        try
        {
            WebRequest request = (WebRequest)ar.AsyncState;
            WebResponse response = request.EndGetResponse(ar);

            State s = new State();
            s.ReponseStream = response.GetResponseStream();
            s.FileStream = new FileStream("download.out"
                , FileMode.Create
                , FileAccess.Write
                , FileShare.None);
            s.Hash = HashAlgorithm.Create("MD5");

            s.ReponseStream.BeginRead(
                s.Buffer
                , 0
                , s.Buffer.Length
                , _callbackReadStream
                , s); 
        }
        catch (Exception e)
        {
            Console.Error.WriteLine(e.Message);
            done.Set();
        }
    }

    private static void CallbackReadStream(IAsyncResult ar)
    {
        try
        {
            State s = (State)ar.AsyncState;
            s.ReadBytes = s.ReponseStream.EndRead(ar);
            s.Hash.ComputeHash(s.Buffer, 0, s.ReadBytes);
            s.FileStream.BeginWrite(
                s.Buffer
                , 0
                , s.ReadBytes
                , _callbackWriteFile
                , s);
        }
        catch (Exception e)
        {
            Console.Error.WriteLine(e.Message);
            done.Set();
        }
    }

    static private void CallbackWriteFile(IAsyncResult ar)
    {
        try
        {
            State s = (State)ar.AsyncState;
            s.FileStream.EndWrite(ar);

            s.FileLength += s.ReadBytes;

            if (0 != s.ReadBytes)
            {
                s.ReponseStream.BeginRead(
                    s.Buffer
                    , 0
                    , s.Buffer.Length
                    , _callbackReadStream
                    , s);
            }
            else
            {
                Console.Out.Write("Downloaded {0} bytes. Hash(base64):{1}",
                    s.FileLength, Convert.ToBase64String(s.Hash.Hash));
                done.Set();
            }
        }
        catch (Exception e)
        {
            Console.Error.WriteLine(e.Message);
            done.Set();
        }

    }
}

回答4:

I'm pretty sure you'll get an OutOfMemoryException. Easy way to try is try to read a DVD ISO image or something into memory using a memory stream. If you can read the whole thing, then you should be fine. If you get an exception, well, there you go.

来源：https://stackoverflow.com/questions/1511628/will-a-large-system-io-memorystream-result-in-my-applications-memory-usage-incr

标签

httpwebresponse

memorystream