My mac is armed with 16 cores.
System.out.println(Runtime.getRuntime().availableProcessors()); //16
I\'m running the code below to see the eff
My hunch is that you may have put so much burden on the disk I/O that you slowed everything down! See the I/O performance in "Activity Monitor" (if you are on OSX). On Linux, use vmstat
command to get an idea of what is going on. [If you see lots of swapping or high rate of reads/s and writes/s then there you go]
Few things I noticed:
CountFileLineThread
is not in the code. Please put it so we can see exactly what's going on.
Next,
for (Future<Integer> future : futures)
{
Integer result = future.get();
total+=result;
System.out.println("result :"+result);
}
Here, note that you are blocked on on the result of the first Task
(future.get()
). Meanwhile the other results may have already been available but you can't see them until the first completes. Use CompletionService
instead to get the results in the order they finish for better measurement. It doesn't matter though since you want all Threads to be done before ending the timer though.
Another point: Blocking I/O is the key. It doesn't matter, per se, how many cores you have if the tasks are blocked on I/O, Network, etc. Modern Processors have what's what Hyper Threading and they can run a thread waiting to be run if currently executing thread blocks.
So for example, if I have 16 cores and I spawn 16 Threads asking them to read 1 GB files, I will not get any performance improvements just by having more cores. The bottleneck is the disk and memory.
I added this as a comment, but I'm going to throw it in there as answer too. Because your test is doing file I/O, you have probably hit a point with that 6th thread where you are now doing too much I/O and thus slowing everything down. If you really want to see the benefit of the 16 cores you have, you should re-write your file reading thread to use non-blocking I/O.
Adding processors causes all sorts of problems, but mostly they have to do with synchronization between processors. Task-level locking within the file system, etc, can become a problem, but even more of a problem is the synchronization between cores that must occur just to maintain cache coherence, keep track of changed pages, etc. I don't know how many cores per chip you have (gave up tracking that stuff about 10 years ago), but generally once you begin synchronizing off-chip performance goes down the tubes.
I'll add that the JVM can make a major difference here. Careful JVM design is required to minimize the number of shared (and frequently updated) cache lines, and incredible effort is required to make GC work efficiently in a multi-core environment.