Parallel Linq - Use more threads than processors (for non-CPU bound tasks)

前端 未结 4 756
北荒
北荒 2021-01-04 13:36

I\'m using parallel linq, and I\'m trying to download many urls concurrently using essentily code like this:

int threads = 10;
Dictionary

        
相关标签:
4条回答
  • 2021-01-04 13:39

    Do the URLs refer to the same server? If so, it could be that you are hitting the HTTP connection limit instead of the threading limit. There's an easy way to tell - change your code to:

    int threads = 10;
    Dictionary<string, string> results = urls.AsParallel(threads)
        .ToDictionary(url => url, 
                      url => {
                          Console.WriteLine("On thread {0}",
                                            Thread.CurrentThread.ManagedThreadId);
                          return GetPage(url);
                      });
    

    EDIT: Hmm. I can't get ToDictionary() to parallelise at all with a bit of sample code. It works fine for Select(url => GetPage(url)) but not ToDictionary. Will search around a bit.

    EDIT: Okay, I still can't get ToDictionary to parallelise, but you can work around that. Here's a short but complete program:

    using System;
    using System.Collections.Generic;
    using System.Threading;
    using System.Linq;
    using System.Linq.Parallel;
    
    public class Test
    {
    
        static void Main()
        {
            var urls = Enumerable.Range(0, 100).Select(i => i.ToString());
    
            int threads = 10;
            Dictionary<string, string> results = urls.AsParallel(threads)
                .Select(url => new { Url=url, Page=GetPage(url) })
                .ToDictionary(x => x.Url, x => x.Page);
        }
    
        static string GetPage(string x)
        {
            Console.WriteLine("On thread {0} getting {1}",
                              Thread.CurrentThread.ManagedThreadId, x);
            Thread.Sleep(2000);
            return x;
        }
    }
    

    So, how many threads does this use? 5. Why? Goodness knows. I've got 2 processors, so that's not it - and we've specified 10 threads, so that's not it. It still uses 5 even if I change GetPage to hammer the CPU.

    If you only need to use this for one particular task - and you don't mind slightly smelly code - you might be best off implementing it yourself, to be honest.

    0 讨论(0)
  • 2021-01-04 13:54

    Monitor your network traffic. If the URLs are from the same domain it may be limiting the bandwidth. More connections might not actually provide any speed-up.

    0 讨论(0)
  • 2021-01-04 14:00

    By default, .Net has limit of 2 concurrent connections to an end service point (IP:port). Thats why you would not see a difference if all urls are to one and the same server.

    It can be controlled using ServicePointManager.DefaultPersistentConnectionLimit property.

    0 讨论(0)
  • 2021-01-04 14:01

    I think there are already good answers to the question, but I'd like to make one important point. Using PLINQ for tasks that are not CPU bound is in principle wrong design. Not to say that it won't work - it will, but using multiple threads when it is unnecessary can cause troubles.

    Unfortunatelly, there is no good way to solve this problem in C#. In F# you could use asynchornous workflows that run in parallel, but don't block the thread when performing asynchronous calls (under the cover, it uses BeginOperation and EndOperation methods). You can find more information here:

    • Concurrency in F# – Part I – The Asynchronous Workflow

    The same idea can to some extent be used in C#, but it looks a bit weird (but it is more efficient). I wrote an article about that and there is also a library that should be slightly more evolved than my original idea:

    • Asynchronous Programming in C# using Iterators
    • EasyAsync library
    0 讨论(0)
提交回复
热议问题