Asynchronously and parallelly downloading files

后端 未结 1 2038
一向
一向 2020-12-25 09:20

EDIT

I\'ve changed the title of the question to reflect the issue I had but also an answer on how to achieve this easily.


I am trying

相关标签:
1条回答
  • 2020-12-25 09:29

    I solved it and posting it here, might help anyone having the same issue.

    My initial need was a small helper that would quickly download images but also just drop the connection if server does not respond quickly, all this in parallel and asynchronously.

    This helper will return you a tuple that contains the remote path, the local path and the exception if one occurred; so quite useful as it's always good to know why faulty downloads have faulted. I think I forgot none of the situations that can occur for a download but you're welcome to comment it.

    • You specify a list of urls to download
    • You can specify a local file name where it will be saved, if not one will be generated for you
    • Optionally a duration for cancelling a download (handy for slow or unreachable servers)

    You can just use DownloadFileTaskAsync itself or use the ForEachAsync helper for parallel and asynchronous downloads.

    Code with an example on how to use it :

    private async void MainWindow_Loaded(object sender, RoutedEventArgs e)
    {
        IEnumerable<string> enumerable = your urls here;
        var results = new List<Tuple<string, string, Exception>>();
        await enumerable.ForEachAsync(s => DownloadFileTaskAsync(s, null, 1000), (url, t) => results.Add(t));
    }
    
    /// <summary>
    ///     Downloads a file from a specified Internet address.
    /// </summary>
    /// <param name="remotePath">Internet address of the file to download.</param>
    /// <param name="localPath">
    ///     Local file name where to store the content of the download, if null a temporary file name will
    ///     be generated.
    /// </param>
    /// <param name="timeOut">Duration in miliseconds before cancelling the  operation.</param>
    /// <returns>A tuple containing the remote path, the local path and an exception if one occurred.</returns>
    private static async Task<Tuple<string, string, Exception>> DownloadFileTaskAsync(string remotePath,
        string localPath = null, int timeOut = 3000)
    {
        try
        {
            if (remotePath == null)
            {
                Debug.WriteLine("DownloadFileTaskAsync (null remote path): skipping");
                throw new ArgumentNullException("remotePath");
            }
    
            if (localPath == null)
            {
                Debug.WriteLine(
                    string.Format(
                        "DownloadFileTaskAsync (null local path): generating a temporary file name for {0}",
                        remotePath));
                localPath = Path.GetTempFileName();
            }
    
            using (var client = new WebClient())
            {
                TimerCallback timerCallback = c =>
                {
                    var webClient = (WebClient) c;
                    if (!webClient.IsBusy) return;
                    webClient.CancelAsync();
                    Debug.WriteLine(string.Format("DownloadFileTaskAsync (time out due): {0}", remotePath));
                };
                using (var timer = new Timer(timerCallback, client, timeOut, Timeout.Infinite))
                {
                    await client.DownloadFileTaskAsync(remotePath, localPath);
                }
                Debug.WriteLine(string.Format("DownloadFileTaskAsync (downloaded): {0}", remotePath));
                return new Tuple<string, string, Exception>(remotePath, localPath, null);
            }
        }
        catch (Exception ex)
        {
            return new Tuple<string, string, Exception>(remotePath, null, ex);
        }
    }
    
    public static class Extensions
    {
        public static Task ForEachAsync<TSource, TResult>(
            this IEnumerable<TSource> source,
            Func<TSource, Task<TResult>> taskSelector, Action<TSource, TResult> resultProcessor)
        {
            var oneAtATime = new SemaphoreSlim(5, 10);
            return Task.WhenAll(
                from item in source
                select ProcessAsync(item, taskSelector, resultProcessor, oneAtATime));
        }
    
        private static async Task ProcessAsync<TSource, TResult>(
            TSource item,
            Func<TSource, Task<TResult>> taskSelector, Action<TSource, TResult> resultProcessor,
            SemaphoreSlim oneAtATime)
        {
            TResult result = await taskSelector(item);
            await oneAtATime.WaitAsync();
            try
            {
                resultProcessor(item, result);
            }
            finally
            {
                oneAtATime.Release();
            }
        }
    }
    

    I haven't changed the signature of ForEachAsync to choose the level of parallelism, I'll let you adjust it as you wish.

    Output example :

    DownloadFileTaskAsync (null local path): generating a temporary file name for http://cache.thephoenix.com/secure/uploadedImages/The_Phoenix/Music/CD_Review/main_OTR_Britney480.jpg
    DownloadFileTaskAsync (null local path): generating a temporary file name for http://ssimg.soundspike.com/artists/britneyspears_femmefatale_cd.jpg
    DownloadFileTaskAsync (null local path): generating a temporary file name for http://a323.yahoofs.com/ymg/albumreviewsuk__1/albumreviewsuk-526650850-1301400550.jpg?ymm_1xEDE5bu0tMi
    DownloadFileTaskAsync (null remote path): skipping
    DownloadFileTaskAsync (time out due): http://hangout.altsounds.com/geek/gars/images/3/9/8/5/2375.jpg
    DownloadFileTaskAsync (time out due): http://www.beat.com.au/sites/default/files/imagecache/630_315sr/images/article/header/2011/april/britney-spears-femme-fatale.jpg
    DownloadFileTaskAsync (time out due): http://cache.thephoenix.com/secure/uploadedImages/The_Phoenix/Music/CD_Review/main_OTR_Britney480.jpg
    DownloadFileTaskAsync (downloaded): http://newblog.thecmuwebsite.com/wp-content/uploads/2009/12/britneyspears1.jpg
    DownloadFileTaskAsync (downloaded): http://newblog.thecmuwebsite.com/wp-content/uploads/2009/12/britneyspears1.jpg
    DownloadFileTaskAsync (downloaded): http://static.guim.co.uk/sys-images/Music/Pix/site_furniture/2011/3/22/1300816812640/Femme-Fatale.jpg
    DownloadFileTaskAsync (downloaded): http://www.sputnikmusic.com/images/albums/72328.jpg
    

    What used to take up to 1 minute now barely takes 10 seconds for the same result :)

    And big thanks to the author of these 2 posts :

    http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx

    http://blogs.msdn.com/b/pfxteam/archive/2012/03/04/10277325.aspx

    0 讨论(0)
提交回复
热议问题