Speed up loop using multithreading in C# (Question)

后端 未结 6 720
南笙
南笙 2020-12-13 15:40

Imagine I have an function which goes through one million/billion strings and checks smth in them.

f.ex:

foreach (String item in ListOfStrings)
{
            


        
相关标签:
6条回答
  • 2020-12-13 16:29

    The first question you must answer is whether you should be using threading

    If your function CalculateSmth() is basically CPU-bound, i.e. heavy in CPU-usage and basically no I/O-usage, then I have a hard time seeing the point of using threads, since the threads will be competing over the same resource, in this case the CPU.

    If your CalculateSmth() is using both CPU and I/O, then it might be a point in using threading.

    I totally agree with the comment to my answer. I made a erroneous assumption that we were talking about a single CPU with one core, but these days we have multi-core CPUs, my bad.

    0 讨论(0)
  • 2020-12-13 16:34

    Not that I have any good articles here right now, but what you want to do is something along Producer-Consumer with a Threadpool.

    The Producers loops through and creates tasks (which in this case could be to just queue up the items in a List or Stack). The Consumers are, say, five threads that reads one item off the stack, consumes it by calculating it, and then stores it else where.

    This way the multithreading is limited to just those five threads, and they will all have work to do up until the stack is empty.

    Things to think about:

    • Put protection on the input and output list, such as a mutex.
    • If the order is important, make sure that the output order is maintained. One example could be to store them in a SortedList or something like that.
    • Make sure that the CalculateSmth is thread safe, that it doesn't use any global state.
    0 讨论(0)
  • 2020-12-13 16:36

    You could try the Parallel extensions (part of .NET 4.0)

    These allow you to write something like:

    Parallel.Foreach (ListOfStrings, (item) => 
        result.add(CalculateSmth(item));
    );
    

    Of course result.add would need to be thread safe.

    0 讨论(0)
  • 2020-12-13 16:36

    Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.

    For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)

    You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.

    0 讨论(0)
  • 2020-12-13 16:38

    You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:

    List<string> work = (some list with lots of strings)
    
    // Split the work in two
    List<string> odd = new List<string>();
    List<string> even = new List<string>();
    for (int i = 0; i < work.Count; i++)
    {
        if (i % 2 == 0)
        {
            even.Add(work[i]);
        }
        else
        {
            odd.Add(work[i]);
        }
    }
    
    // Set up to worker delegates
    List<Foo> oddResult = new List<Foo>();
    Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };
    
    List<Foo> evenResult = new List<Foo>();
    Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };
    
    // Run two delegates asynchronously
    IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
    IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);
    
    // Wait for both to finish
    evenWork.EndInvoke(evenHandle);
    oddWork.EndInvoke(oddHandle);
    
    // Merge the results from the two jobs
    List<Foo> allResults = new List<Foo>();
    allResults.AddRange(oddResult);
    allResults.AddRange(evenResult);
    
    return allResults;
    
    0 讨论(0)
  • 2020-12-13 16:46

    The Parallel extensions is cool, but this can also be done just by using the threadpool like this:

    using System.Collections.Generic;
    using System.Threading;
    
    namespace noocyte.Threading
    {
        class CalcState
        {
            public CalcState(ManualResetEvent reset, string input) {
                Reset = reset;
                Input = input;
            }
            public ManualResetEvent Reset { get; private set; }
            public string Input { get; set; }
        }
    
        class CalculateMT
        {
            List<string> result = new List<string>();
            List<ManualResetEvent> events = new List<ManualResetEvent>();
    
            private void Calc() {
                List<string> aList = new List<string>();
                aList.Add("test");
    
                foreach (var item in aList)
                {
                    CalcState cs = new CalcState(new ManualResetEvent(false), item);
                    events.Add(cs.Reset);
                    ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
                }
                WaitHandle.WaitAll(events.ToArray());
            }
    
            private void Calculate(object s)
            {
                CalcState cs = s as CalcState;
                cs.Reset.Set();
                result.Add(cs.Input);
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题