Multi threading which would be the best to use? (Threadpool or threads)

前端 未结 7 1474
醉梦人生
醉梦人生 2021-01-07 02:52

Hopefully this is a better question than my previous. I have a .exe which I will be passing different parameters (file paths) to which it will then take in and parse. So I w

相关标签:
7条回答
  • 2021-01-07 03:06

    See this question for how to find out the number of cores.

    Then use Parallel.ForEach with ParallelOptions with MaxDegreeOfParallelism set to the number of cores.

    Parallel.ForEach(args, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, (element) => Console.WriteLine(element));
    
    0 讨论(0)
  • 2021-01-07 03:09

    Spontaneously I would push your file paths into a thread safe queue and then fire up a number of threads (say one per core). Each thread would repeatedly pop one item from the queue and process the it accordingly. The work is done when the queue is empty.

    Implementation suggestions (to answer some of the questions in comments):


    Queue:

    In C# you could have a look at the Queue Class and the Queue.Synchronized Method for the implementation of the queue:

    "Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe. To guarantee the thread safety of the Queue, all operations must be done through the wrapper returned by the Synchronized method. Enumerating through a collection is intrinsically not a thread-safe procedure. Even when a collection is synchronized, other threads can still modify the collection, which causes the enumerator to throw an exception. To guarantee thread safety during enumeration, you can either lock the collection during the entire enumeration or catch the exceptions resulting from changes made by other threads."


    Threading:

    For the threading part I suppose that any of the examples in the msdn threading tutorial would do (the tutorial is a bit old, but should be valid). Should not need to worry about synchronizing the threads as they can work independently from each other. The queue above is the only common resource they should need to access (hence the importance of thread safety of the queue).


    Start the external process (.exe):

    The following code is borrowed (and tweaked) from How to wait for a shelled application to finish by using Visual C#. You need to edit for your own needs, but as a starter:

    //How to Wait for a Shelled Process to Finish
    //Create a new process info structure.
    ProcessStartInfo pInfo = new ProcessStartInfo();
    //Set the file name member of the process info structure.
    pInfo.FileName = "mypath\myfile.exe";
    //Start the process.
    Process p = Process.Start(pInfo);
    //Wait for the process to end.
    p.WaitForExit();
    

    Pseudo code:

    Main thread;
       Create thread safe queue
       Populate the queue with all the file paths
       Create child threads and wait for them to finish
    
          Child threads:
             While queue is not empty  << this section is critical, not more then one  
                pop file from queue    << thread can check and pop at the time
    
                start external exe
                    wait for it....
                end external exe 
    
             end while
          Child thread exits
    
       Main thread waits for all child threads to finish
    Program finishes.
    
    0 讨论(0)
  • 2021-01-07 03:12

    Each exe launched will occur in its own process. You don't need to use a threadpool or multiple threads; the OS manages the processes (and since they're processes and not threads, they're very independent; completely separate memory space, etc.).

    0 讨论(0)
  • 2021-01-07 03:14

    If you're targeting the .Net 4 framework the Parallel.For or Parallel.Foreach are extremely helpful. If those don't meet your requirements I've found the Task.Factory to be useful and straightforward to use as well.

    0 讨论(0)
  • 2021-01-07 03:16

    As I said in my answer to your previous question, I think you don't understand the difference between processes and threads. Processes are incredibly "heavy" (*); each process can contain many threads. If you are spawning new processes from a parent process, that parent process doesn't need to create new threads; each process will have its own collection of threads.

    Only create threads in the parent process if all the work is being done in the same process.

    Think of a thread as a worker, and a process as a building containing one or more workers.

    One strategy is "build a single building and populate it with ten workers who do each do some amount of work". You get the expense of building one process and ten threads.

    If your strategy is "build a building. Then have the one worker in that building order the construction of a thousand more buildings, each of which contains a worker that does their bidding", then you get the expense of building 1001 buildings and hiring 1001 workers.

    The strategy you do not want to pursue is "build a building. Hire 1000 workers in that building. Then instruct each worker to build a building, which then has one worker to go do the real work." There is no point in making a thread whose sole job is creating a process that then creates a thread! You have 1001 buildings and 2001 workers, half of whom are immediately idle but still have to be paid.

    Looking at your specific problem: the key question is "where is the bottleneck?" Spawning off new processes or new threads only helps when the performance problem is that the perf is gated on the processor. If the performance of your parser is gated not on how fast you can parse the file but rather on how fast you can get it off disk, then parallelizing it is going to make things far, far worse. You'll have a huge amount of system resources devoted to all hammering on the same disk controller at the same time, and the disk controller will get slower as more load piles up on it.

    UPDATE:

    I need to limit the number of executions of the .exe to ONE execution PER CORE. This is the most efficient because if I am parsing 100,000 files I can't just fire up 100000 processes. So I am using threads to limit the number of executions at one time to one execution per core. If there is another way (other than threads) to find out if a processor isn't tied up in execution, or if the .exe has finished please explain

    This seems like an awfully complicated way to go about it. Suppose you have n processors. Your proposed strategy, as I understand it, is to fire up n threads, then have each thread fire up one process, and you know that since the operating system will probably schedule one thread per CPU that somehow the processor will magically also schedule the new thread in each new process on a different CPU?

    That seems like a tortuous chain of reasoning that depends on implementation details of the operating system. This is craziness. If you want to set the processor affinity of a particular process, just set the processor affinity on the process! Don't be doing this crazy thing with threads and hope that it works out.

    I say that if you want to have no more than n instances of an executable running, one per processor, don't mess around with threads at all. Rather, just have one thread sit in a loop, constantly monitoring what processes are running. If there are fewer than n copies of the executable running, spawn another and set its processor affinity to be the CPU you like best. If there are n or more copies of the executable running, go to sleep for a second (or a minute, or whatever makes sense), and when you wake up, check again. Keep doing that until you're done. That seems like a much easier approach.


    (*) Threads are also heavy, but they are lighter than processes.

    0 讨论(0)
  • 2021-01-07 03:16

    If you're launching a .exe, then you have no choice. You will be running this asynchronously in a separate process. For the program which does the launching, I would recommend that you use a single thread and keep a list of the processes you launched.

    0 讨论(0)
提交回复
热议问题