I have an application that accepts work on a queue and then spins that work off to be completed on independent threads. The number of threads is not massive, say up to 100, but
To get the most work done the quickest: am I best off to just launch more threads when I need to do more work and let the Java thread scheduler handle distributing the work, or would getting smarter and managing the work load to keep the CPU below 100% get me further faster?
As you add more and more threads the overhead incurred in the context-switching, memory cache flushing, memory cache overflowing, and kernel and JVM thread management increases. As your threads hog the CPU their kernel priorities drop to some minimum and they will reach the time-slice minimum. As more and more threads crowd memory, they overflow the various internal CPU memory caches. There is a higher chance the CPU will need to swap the job in from slower memory. Internal to the JVM there is more mutex local contention and probably some (maybe small) incremental per-thread and object bandwidth GC overhead. Depending on how synchronized your user-tasks are, more threads would cause increased memory flushing and lock contention.
With any program and any architecture, there is a sweet spot where threads can optimally utilize the available processor and IO resources while limiting kernel and JVM overhead. Finding that sweet spot repeatedly will require a number of iterations and some guesswork.
I would recommend using the Executors.newFixedThreadPool(SOME_NUMBER);
and submit you jobs to it. Then you can do multiple runs varying the number of threads up and down until you find the optimal number of pools running simultaneously according to the work and the architecture of the box.
Understand however, that the optimal number of threads will vary based on how many processors and other factors that may be non-trivial to determine. More threads may be needed if they are blocking on disk or network IO resources. Fewer threads if the work they are doing is mostly CPU based.