Optimal number of threads per core

前端 未结 13 1877
忘掉有多难
忘掉有多难 2020-11-22 14:27

Let\'s say I have a 4-core CPU, and I want to run some process in the minimum amount of time. The process is ideally parallelizable, so I can run chunks of it on an infinite

相关标签:
13条回答
  • 2020-11-22 15:14

    I thought I'd add another perspective here. The answer depends on whether the question is assuming weak scaling or strong scaling.

    From Wikipedia:

    Weak scaling: how the solution time varies with the number of processors for a fixed problem size per processor.

    Strong scaling: how the solution time varies with the number of processors for a fixed total problem size.

    If the question is assuming weak scaling then @Gonzalo's answer suffices. However if the question is assuming strong scaling, there's something more to add. In strong scaling you're assuming a fixed workload size so if you increase the number of threads, the size of the data that each thread needs to work on decreases. On modern CPUs memory accesses are expensive and would be preferable to maintain locality by keeping the data in caches. Therefore, the likely optimal number of threads can be found when the dataset of each thread fits in each core's cache (I'm not going into the details of discussing whether it's L1/L2/L3 cache(s) of the system).

    This holds true even when the number of threads exceeds the number of cores. For example assume there's 8 arbitrary unit (or AU) of work in the program which will be executed on a 4 core machine.

    Case 1: run with four threads where each thread needs to complete 2AU. Each thread takes 10s to complete (with a lot of cache misses). With four cores the total amount of time will be 10s (10s * 4 threads / 4 cores).

    Case 2: run with eight threads where each thread needs to complete 1AU. Each thread takes only 2s (instead of 5s because of the reduced amount of cache misses). With four cores the total amount of time will be 4s (2s * 8 threads / 4 cores).

    I've simplified the problem and ignored overheads mentioned in other answers (e.g., context switches) but hope you get the point that it might be beneficial to have more number of threads than the available number of cores, depending on the data size you're dealing with.

    0 讨论(0)
  • 2020-11-22 15:16

    Hope this makes sense, Check the CPU and Memory utilization and put some threshold value. If the threshold value is crossed,don't allow to create new thread else allow...

    0 讨论(0)
  • 2020-11-22 15:18

    speaking from computation and memory bound point of view (scientific computing) 4000 threads will make application run really slow. Part of the problem is a very high overhead of context switching and most likely very poor memory locality.

    But it also depends on your architecture. From where I heard Niagara processors are suppose to be able to handle multiple threads on a single core using some kind of advanced pipelining technique. However I have no experience with those processors.

    0 讨论(0)
  • 2020-11-22 15:22

    The actual performance will depend on how much voluntary yielding each thread will do. For example, if the threads do NO I/O at all and use no system services (i.e. they're 100% cpu-bound) then 1 thread per core is the optimal. If the threads do anything that requires waiting, then you'll have to experiment to determine the optimal number of threads. 4000 threads would incur significant scheduling overhead, so that's probably not optimal either.

    0 讨论(0)
  • 2020-11-22 15:26

    The ideal is 1 thread per core, as long as none of the threads will block.

    One case where this may not be true: there are other threads running on the core, in which case more threads may give your program a bigger slice of the execution time.

    0 讨论(0)
  • 2020-11-22 15:28

    If your threads don't do I/O, synchronization, etc., and there's nothing else running, 1 thread per core will get you the best performance. However that very likely not the case. Adding more threads usually helps, but after some point, they cause some performance degradation.

    Not long ago, I was doing performance testing on a 2 quad-core machine running an ASP.NET application on Mono under a pretty decent load. We played with the minimum and maximum number of threads and in the end we found out that for that particular application in that particular configuration the best throughput was somewhere between 36 and 40 threads. Anything outside those boundaries performed worse. Lesson learned? If I were you, I would test with different number of threads until you find the right number for your application.

    One thing for sure: 4k threads will take longer. That's a lot of context switches.

    0 讨论(0)
提交回复
热议问题