I have an application that accepts work on a queue and then spins that work off to be completed on independent threads. The number of threads is not massive, say up to 100, but
If you have too many simultaneous compute-intensive tasks in parallel threads, you reach the point of diminishing returns very quickly. In fact, if there are N processors (cores), then you don't want more than N such threads. Now, if the tasks occasionally pause for I/O or user interaction, then the right number can be somewhat larger. But in general, if at any one moment there are more threads that want to do computation than there are cores available, then your program is wasting time on context switches -- i.e., the scheduling is costing you.
You should keep 100% usage but with minimum number of threads. 100 threads looks too many.
The fact that your CPU is running at 100% does not tell much about how busy they are doing useful work. In your case, you are using more threads than cores so the 100% includes some context switching and uses memory unnecessarily (small impact for 100 threads), which is sub-optimal.
For CPU intensive task, I generally use this idiom:
private final int NUM_THREADS = Runtime.getRuntime().availableProcessors() + 1;
private final ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
Using more threads, as others have indicated, only introduces unnecessary context switching.
Obviously if the tasks do some I/O and other blocking operations, this is not applicable and a larger pool would make sense.
EDIT
To reply to @MartinJames comment, I have run a (simplistic) benchmark - result shows that going from a pool size = number of processors + 1 to 100 degrades the performance only slightly (let's call it 5%) - going to higher figures (1000 and 10000) does hit the performance significantly.
Results are the average of 10 runs:
Pool size: 9: 238 ms. //(NUM_CORES+1)
Pool size: 100: 245 ms.
Pool size: 1000: 319 ms.
Pool size: 10000: 2482 ms.
Code:
public class Test {
private final static int NUM_CORES = Runtime.getRuntime().availableProcessors();
private static long count;
private static Runnable r = new Runnable() {
@Override
public void run() {
int count = 0;
for (int i = 0; i < 100_000; i++) {
count += i;
}
Test.count += count;
}
};
public static void main(String[] args) throws Exception {
//warmup
runWith(10);
//test
runWith(NUM_CORES + 1);
runWith(100);
runWith(1000);
runWith(10000);
}
private static void runWith(int poolSize) throws InterruptedException {
long average = 0;
for (int run = 0; run < 10; run++) { //run 10 times and take the average
Test.count = 0;
ExecutorService executor = Executors.newFixedThreadPool(poolSize);
long start = System.nanoTime();
for (int i = 0; i < 50000; i++) {
executor.submit(r);
}
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
long end = System.nanoTime();
average += ((end - start) / 1000000);
System.gc();
}
System.out.println("Pool size: " + poolSize + ": " + average / 10 + " ms. ");
}
}
'Would getting smarter and managing the work load to keep the CPU below 100% get me further faster?'
Probably not.
As others have posted, 100 threads is too many for a threadpool if most of the tasks are CPU-intensive. It won't make much difference to performance on typical systems - with that much overload it will be bad with 4 threads and bad with 400.
How did you decide on 100 threads? Why not 16, say?
'The number of threads is not massive, say up to 100' - does it vary? Just create 16 at startup and stop managing them - just pass the queue to them and forget about them.
Horrible thought - you aren't creating a new thread for each task, are you?
To get the most work done the quickest: am I best off to just launch more threads when I need to do more work and let the Java thread scheduler handle distributing the work, or would getting smarter and managing the work load to keep the CPU below 100% get me further faster?
As you add more and more threads the overhead incurred in the context-switching, memory cache flushing, memory cache overflowing, and kernel and JVM thread management increases. As your threads hog the CPU their kernel priorities drop to some minimum and they will reach the time-slice minimum. As more and more threads crowd memory, they overflow the various internal CPU memory caches. There is a higher chance the CPU will need to swap the job in from slower memory. Internal to the JVM there is more mutex local contention and probably some (maybe small) incremental per-thread and object bandwidth GC overhead. Depending on how synchronized your user-tasks are, more threads would cause increased memory flushing and lock contention.
With any program and any architecture, there is a sweet spot where threads can optimally utilize the available processor and IO resources while limiting kernel and JVM overhead. Finding that sweet spot repeatedly will require a number of iterations and some guesswork.
I would recommend using the Executors.newFixedThreadPool(SOME_NUMBER);
and submit you jobs to it. Then you can do multiple runs varying the number of threads up and down until you find the optimal number of pools running simultaneously according to the work and the architecture of the box.
Understand however, that the optimal number of threads will vary based on how many processors and other factors that may be non-trivial to determine. More threads may be needed if they are blocking on disk or network IO resources. Fewer threads if the work they are doing is mostly CPU based.