I have an application that accepts work on a queue and then spins that work off to be completed on independent threads. The number of threads is not massive, say up to 100, but
If you have too many simultaneous compute-intensive tasks in parallel threads, you reach the point of diminishing returns very quickly. In fact, if there are N processors (cores), then you don't want more than N such threads. Now, if the tasks occasionally pause for I/O or user interaction, then the right number can be somewhat larger. But in general, if at any one moment there are more threads that want to do computation than there are cores available, then your program is wasting time on context switches -- i.e., the scheduling is costing you.