How to specify the number of parallel tasks executed in Parallel.ForEach? [duplicate]

你说的曾经没有我的故事 提交于 2019-12-11 03:09:32

问题


I have ~500 tasks, each of them takes ~5 seconds where most of the time is wasted on waiting for the remote resource to reply. I would like to define the number of threads that should be spawned myself (after some testing) and run the tasks on those threads. When one task finishes I would like to spawn another task on the thread that became available.

I found System.Threading.Tasks the easiest to achieve what I want, but I think it is impossible to specify the number of tasks that should be executed in parallel. For my machine it's always around 8 (quad core cpu). Is it possible to somehow tell how many tasks should be executed in parallel? If not what would be the easiest way to achieve what I want? (I tried with threads, but the code is much more complex). I tried increasing MaxDegreeOfParallelism parameter, but it only limits the maximum number, so no luck here...

This is the code that I have currently:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        private static List<string> _list = new List<string>();
        private static int _toProcess = 0;

        static void Main(string[] args)
        {   
            for (int i = 0; i < 1000; ++i)
            {
                _list.Add("parameter" + i);
            }

            var w = new Worker();
            var w2 = new StringAnalyzer();

            Parallel.ForEach(_list, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, item =>
            {
                ++_toProcess;
                string data = w.DoWork(item);
                w2.AnalyzeProcessedString(data);
            });

            Console.WriteLine("Finished");           
            Console.ReadKey();
        }

        static void Done(Task<string> t)
        {            
            Console.WriteLine(t.Result);
            --_toProcess;
        }
    }

    class Worker
    {
        public string DoWork(string par)
        {
            // It's a long running but not CPU heavy task (downloading stuff from the internet)
            System.Threading.Thread.Sleep(5000);            
            return par + " processed";
        }
    }

    class StringAnalyzer
    {
        public void AnalyzeProcessedString(string data)
        {
            // Rather short, not CPU heavy
            System.Threading.Thread.Sleep(1000);
            Console.WriteLine(data + " and analyzed");
        }
    }
}

回答1:


As L.B mentioned, .NET Framework has methods that performs I/O operations (requests to databases, web services etc.) using IOCP internally, they can be recognized by their names - it ends with Async by convention. So you could just use them to build robust scalable applications that can process multiple requests simultaneously.

EDIT: I've completely rewritten the code example with the modern best practices so it becomes much more readable, shorter and easy to use.

For the .NET 4.5 we can use async-await approach:

class Program
{
    static void Main(string[] args)
    {
        var task = Worker.DoWorkAsync();
        task.Wait(); //stop and wait until our async method completed

        foreach (var item in task.Result)
        {
            Console.WriteLine(item);
        }

        Console.ReadLine();
    }
}

static class Worker
{
    public async static Task<IEnumerable<string>> DoWorkAsync()
    {
        List<string> results = new List<string>();

        for (int i = 0; i < 10; i++)
        {
            var request = (HttpWebRequest)WebRequest.Create("http://microsoft.com");
            using (var response = await request.GetResponseAsync())
            {
                results.Add(response.ContentType);
            }
        }

        return results;
    }
}

Here is the nice MSDN tutorial about async programming using async-await.




回答2:


Assuming you can use native async methods like HttpClient.GetStringAsync while getting your resource,

int numTasks = 20;
SemaphoreSlim semaphore = new SemaphoreSlim(numTasks);
HttpClient client = new HttpClient();

List<string> result = new List<string>();
foreach(var url in urls)
{
    semaphore.Wait();

    client.GetStringAsync(url)
          .ContinueWith(t => {
              lock (result) result.Add(t.Result);
              semaphore.Release();
          });
}

for (int i = 0; i < numTasks; i++) semaphore.Wait();

Since GetStringAsync uses IO Completions Ports internally (like most other async IO methods) instead of creating new threads, this can be the solution you are after.

See also http://blog.stephencleary.com/2013/11/there-is-no-thread.html



来源:https://stackoverflow.com/questions/22035915/how-to-specify-the-number-of-parallel-tasks-executed-in-parallel-foreach

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!