how to get file parallel using HttpWebRequest

前端 未结 2 481
北海茫月
北海茫月 2021-02-08 14:38

I\'m trying to make a program like IDM, that can download parts of the file simultaneously.
The tool i\'m using to achieve this is TPL in C# .Net4.5
But I\'m having a

相关标签:
2条回答
  • 2021-02-08 15:19

    I would use HttpClient.SendAsync rather than WebRequest (see "HttpClient is Here!").

    I would not use any extra threads. The HttpClient.SendAsync API is naturally asynchronous and returns an awaitable Task<>, there is no need to offload it to a pool thread with Task.Run/Task.TaskFactory.StartNew (see this for a detailed discussion).

    I would also limit the number of parallel downloads with SemaphoreSlim.WaitAsync(). Below is my take as a console app (not extensively tested):

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Net.Http;
    using System.Threading;
    using System.Threading.Tasks;
    
    namespace Console_21737681
    {
        class Program
        {
            const int MAX_PARALLEL = 4; // max parallel downloads
            const int CHUNK_SIZE = 2048; // size of a single chunk
    
            // a chunk of downloaded data
            class Chunk
            {
                public long Start { get; set; }
                public int Length { get; set; }
                public byte[] Data { get; set; }
            };
    
            // throttle downloads
            SemaphoreSlim _throttleSemaphore = new SemaphoreSlim(MAX_PARALLEL);
    
            // get a chunk
            async Task<Chunk> GetChunk(HttpClient client, long start, int length, string url)
            {
                await _throttleSemaphore.WaitAsync();
                try
                {
                    using (var request = new HttpRequestMessage(HttpMethod.Get, url))
                    {
                        request.Headers.Range = new System.Net.Http.Headers.RangeHeaderValue(start, start + length - 1);
                        using (var response = await client.SendAsync(request))
                        {
                            var data = await response.Content.ReadAsByteArrayAsync();
                            return new Chunk { Start = start, Length = length/*, Data = data*/ };
                        }
                    }
                }
                finally
                {
                    _throttleSemaphore.Release();
                }
            }
    
            // download the URL in parallel by chunks
            async Task<Chunk[]> DownloadAsync(string url)
            {
                using (var client = new HttpClient())
                {
                    var request = new HttpRequestMessage(HttpMethod.Head, url);
                    var response = await client.SendAsync(request);
                    var contentLength = response.Content.Headers.ContentLength;
    
                    if (!contentLength.HasValue)
                        throw new InvalidOperationException("ContentLength");
    
                    var numOfChunks = (int)((contentLength.Value + CHUNK_SIZE - 1) / CHUNK_SIZE);
    
                    var tasks = Enumerable.Range(0, numOfChunks).Select(i =>
                    {
                        // start a new chunk
                        long start = i * CHUNK_SIZE;
                        var length = (int)Math.Min(CHUNK_SIZE, contentLength.Value - start);
                        return GetChunk(client, start, length, url);
                    }).ToList();
    
                    await Task.WhenAll(tasks);
    
                    // the order of chunks is random
                    return tasks.Select(task => task.Result).ToArray();
                }
            }
    
            static void Main(string[] args)
            {
                var program = new Program();
                var chunks = program.DownloadAsync("http://flaglane.com/download/australian-flag/australian-flag-large.png").Result;
    
                Console.WriteLine("Chunks: " + chunks.Count());
                Console.ReadLine();
            }
        }
    }
    
    0 讨论(0)
  • 2021-02-08 15:27

    OK, here's how I would do what you're attempting. This is basically the same idea, just implemented differently.

    public static void DownloadFileInPiecesAndSave()
    {
        //test
        var uri = new Uri("http://www.w3.org/");
    
        var bytes = DownloadInPieces(uri, 4);
        File.WriteAllBytes(@"c:\temp\RangeDownloadSample.html", bytes);
    }
    
    /// <summary>
    /// Donwload a file via HTTP in multiple pieces using a Range request.
    /// </summary>
    public static byte[] DownloadInPieces(Uri uri, uint numberOfPieces)
    {
        //I'm just fudging this for expository purposes. In reality you would probably want to do a HEAD request to get total file size.
        ulong totalFileSize = 1003; 
    
        var pieceSize = totalFileSize / numberOfPieces;
    
        List<Task<byte[]>> tasks = new List<Task<byte[]>>();
        for (uint i = 0; i < numberOfPieces; i++)
        {
            var start = i * pieceSize;
            var end = start + (i == numberOfPieces - 1 ? pieceSize + totalFileSize % numberOfPieces : pieceSize);
            tasks.Add(DownloadFilePiece(uri, start, end));
        }
    
        Task.WaitAll(tasks.ToArray());
    
        //This is probably not the single most efficient way to combine byte arrays, but it is succinct...
        return tasks.SelectMany(t => t.Result).ToArray();
    }
    
    private static async Task<byte[]> DownloadFilePiece(Uri uri, ulong rangeStart, ulong rangeEnd)
    {
        try
        {
            var request = (HttpWebRequest)WebRequest.Create(uri);
            request.AddRange((long)rangeStart, (long)rangeEnd);
            request.Proxy = WebProxy.GetDefaultProxy();
    
            using (var response = await request.GetResponseAsync())
            using (var responseStream = response.GetResponseStream())
            using (var memoryStream = new MemoryStream((int)(rangeEnd - rangeStart)))
            {
                await responseStream.CopyToAsync(memoryStream);
                return memoryStream.ToArray();
            }
        }
        catch (WebException wex)
        {
            //Do lots of error handling here, lots of things can go wrong
            //In particular watch for 416 Requested Range Not Satisfiable
            return null;
        }
        catch (Exception ex)
        {
            //handle the unexpected here...
            return null;
        }
    }
    

    Note that I glossed over a lot of stuff here, such as:

    • Detecting if the server supports range requests. If it doesn't then the server will return the entire content in each request, and we'll get several copies of it.
    • Handling any sort of HTTP errors. What if the third request fails?
    • Retry logic
    • Timeouts
    • Figuring out how big the file actually is
    • Checking whether the file is big enough to warrant multiple requests, and if so how many? It's probably not worth doing this in parallel for files under 1 or 2 MB, but you'd have to test
    • Most likely a bunch of other stuff.

    So you've got a long way to go before I would use this in production. But it should give you an idea of where to start.

    0 讨论(0)
提交回复
热议问题