Throttling asynchronous tasks

后端 未结 3 433
心在旅途
心在旅途 2020-11-22 13:13

I would like to run a bunch of async tasks, with a limit on how many tasks may be pending completion at any given time.

Say you have 1000 URLs, and you only want to

3条回答
  •  北海茫月
    2020-11-22 13:36

    Say you have 1000 URLs, and you only want to have 50 requests open at a time; but as soon as one request completes, you open up a connection to the next URL in the list. That way, there are always exactly 50 connections open at a time, until the URL list is exhausted.

    The following simple solution has surfaced many times here on SO. It doesn't use blocking code and doesn't create threads explicitly, so it scales very well:

    const int MAX_DOWNLOADS = 50;
    
    static async Task DownloadAsync(string[] urls)
    {
        using (var semaphore = new SemaphoreSlim(MAX_DOWNLOADS))
        using (var httpClient = new HttpClient())
        {
            var tasks = urls.Select(async url => 
            {
                await semaphore.WaitAsync();
                try
                {
                    var data = await httpClient.GetStringAsync(url);
                    Console.WriteLine(data);
                }
                finally
                {
                    semaphore.Release();
                }
            });
    
            await Task.WhenAll(tasks);
        }
    }
    

    The thing is, the processing of the downloaded data should be done on a different pipeline, with a different level of parallelism, especially if it's a CPU-bound processing.

    E.g., you'd probably want to have 4 threads concurrently doing the data processing (the number of CPU cores), and up to 50 pending requests for more data (which do not use threads at all). AFAICT, this is not what your code is currently doing.

    That's where TPL Dataflow or Rx may come in handy as a preferred solution. Yet it is certainly possible to implement something like this with plain TPL. Note, the only blocking code here is the one doing the actual data processing inside Task.Run:

    const int MAX_DOWNLOADS = 50;
    const int MAX_PROCESSORS = 4;
    
    // process data
    class Processing
    {
        SemaphoreSlim _semaphore = new SemaphoreSlim(MAX_PROCESSORS);
        HashSet _pending = new HashSet();
        object _lock = new Object();
    
        async Task ProcessAsync(string data)
        {
            await _semaphore.WaitAsync();
            try
            {
                await Task.Run(() =>
                {
                    // simuate work
                    Thread.Sleep(1000);
                    Console.WriteLine(data);
                });
            }
            finally
            {
                _semaphore.Release();
            }
        }
    
        public async void QueueItemAsync(string data)
        {
            var task = ProcessAsync(data);
            lock (_lock)
                _pending.Add(task);
            try
            {
                await task;
            }
            catch
            {
                if (!task.IsCanceled && !task.IsFaulted)
                    throw; // not the task's exception, rethrow
                // don't remove faulted/cancelled tasks from the list
                return;
            }
            // remove successfully completed tasks from the list 
            lock (_lock)
                _pending.Remove(task);
        }
    
        public async Task WaitForCompleteAsync()
        {
            Task[] tasks;
            lock (_lock)
                tasks = _pending.ToArray();
            await Task.WhenAll(tasks);
        }
    }
    
    // download data
    static async Task DownloadAsync(string[] urls)
    {
        var processing = new Processing();
    
        using (var semaphore = new SemaphoreSlim(MAX_DOWNLOADS))
        using (var httpClient = new HttpClient())
        {
            var tasks = urls.Select(async (url) =>
            {
                await semaphore.WaitAsync();
                try
                {
                    var data = await httpClient.GetStringAsync(url);
                    // put the result on the processing pipeline
                    processing.QueueItemAsync(data);
                }
                finally
                {
                    semaphore.Release();
                }
            });
    
            await Task.WhenAll(tasks.ToArray());
            await processing.WaitForCompleteAsync();
        }
    }
    

提交回复
热议问题