I am trying to find out about the performance difference between normal multithreading and multithreading using executor (to maintain a thread pool).
The below are code
To see how something scales, I would try to keep the cost of monitoring to a minimum and I would compare a small number to a large number.
public class Executor_Demo {
public static void main(String... arg) throws ExecutionException, InterruptedException {
int nThreads = 5100;
ExecutorService executor = Executors.newFixedThreadPool(nThreads, new DaemonThreadFactory());
List<Future<Results>> futures = new ArrayList<Future<Results>>();
for (int i = 0; i < nThreads; i++) {
futures.add(executor.submit(new BackgroundCallable()));
}
Results result = new Results();
for (Future<Results> future : futures) {
result.merge(future.get());
}
executor.shutdown();
result.print(System.out);
}
static class Results {
private long cpuTime;
private long userTime;
Results() {
final ThreadMXBean tb = ManagementFactory.getThreadMXBean();
cpuTime = tb.getCurrentThreadCpuTime();
userTime = tb.getCurrentThreadUserTime();
}
public void merge(Results results) {
cpuTime += results.cpuTime;
userTime += results.userTime;
}
public void print(PrintStream out) {
ThreadMXBean tb = ManagementFactory.getThreadMXBean();
List<MemoryPoolMXBean> pools = ManagementFactory.getMemoryPoolMXBeans();
for (int i = 0, poolsSize = pools.size(); i < poolsSize; i++) {
MemoryPoolMXBean pool = pools.get(i);
MemoryUsage peak = pool.getPeakUsage();
out.format("Peak %s memory used:\t%,d%n", pool.getName(), peak.getUsed());
out.format("Peak %s memory reserved:\t%,d%n", pool.getName(), peak.getCommitted());
}
out.println("Total thread CPU time\t" + cpuTime);
out.println("Total thread user time\t" + userTime);
out.println("Total started thread count\t" + tb.getTotalStartedThreadCount());
out.println("Current Thread Count\t" + tb.getThreadCount());
out.println("Peak Thread Count\t" + tb.getPeakThreadCount());
out.println("Daemon Thread Count\t" + tb.getDaemonThreadCount());
}
}
static class DaemonThreadFactory implements ThreadFactory {
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setDaemon(true);
return t;
}
}
static class BackgroundCallable implements Callable<Results> {
@Override
public Results call() throws Exception {
Thread.sleep(100);
return new Results();
}
}
}
when tested with -XX:MaxNewSize=64m
(this limits the size temporary memory spaces will increase)
100 threads
Peak Code Cache memory used: 386,880
Peak Code Cache memory reserved: 2,555,904
Peak PS Eden Space memory used: 41,280,984
Peak PS Eden Space memory reserved: 50,331,648
Peak PS Survivor Space memory used: 0
Peak PS Survivor Space memory reserved: 8,388,608
Peak PS Old Gen memory used: 0
Peak PS Old Gen memory reserved: 192,675,840
Peak PS Perm Gen memory used: 3,719,616
Peak PS Perm Gen memory reserved: 21,757,952
Total thread CPU time 20000000
Total thread user time 20000000
Total started thread count 105
Current Thread Count 93
Peak Thread Count 105
Daemon Thread Count 92
5100 threads
Peak Code Cache memory used: 425,728
Peak Code Cache memory reserved: 2,555,904
Peak PS Eden Space memory used: 59,244,544
Peak PS Eden Space memory reserved: 59,244,544
Peak PS Survivor Space memory used: 2,949,152
Peak PS Survivor Space memory reserved: 8,388,608
Peak PS Old Gen memory used: 3,076,400
Peak PS Old Gen memory reserved: 192,675,840
Peak PS Perm Gen memory used: 3,787,096
Peak PS Perm Gen memory reserved: 21,757,952
Total thread CPU time 810000000
Total thread user time 150000000
Total started thread count 5105
Current Thread Count 5105
Peak Thread Count 5105
Daemon Thread Count 5104
The main increase is the increase in old gen used ~ 3 MB or about 6 KB per thread. and the CPU used by 956 ms or about 0.2 ms per thread.
In your first example, you are creating one thread, in the second you are creating 1000.
The output you are performing appears to be most of the work and you have much more output in the second case than the first.
You need to be sure your testing and monitoring is far more light weight than want you are trying to monitor/measure.
Each thread consumes memory for stack, something from 256K to 1M. You can set stack size manually, but it is dangerous to set it below 128K. So If you have 2G memory and can afford to spend 1/2 for threads, you'll have no more than 8K threads. If this is ok for you, use normal multithreading (each Runnable has its own stack). If you are not willing or not able to spend so much memory for each Runnable, use Executor. Set thread pool size to the number of processors (Runtime.availableProcessors()), or several times more. The main problem arise, is that you cannot make Thread.sleep() or otherwise block thread in you runnable (say, wait for user response), because such blocking effectively excludes the thread from servicing. As a result, if you use thread pool of limited size, so called "thread starvation" occur, which is effectively a deadlock. If your thread pool is of unlimited size, then you fall back to normal multithreading and soon run out of memory.
The cure is to use asynchonous operations, that is, setup some request with your callback, and exit the run() method. The callback should then start execution of some Runnable object (maybe the same) with Executor.execute(Runnable), or it can execute the method runnable.run() itself.
Asynchronous input-output operations are present now in Java 7 (nio2), but I failed to make it serve more than several hundreds of network connections. For servicing network connections, asynchronous network libraries can be used (e.g. Apache Netty).
Organizing callbacks and execution of runnables may require sophisticated synchronization. To make life easier, consider to use Actor model (http://en.wikipedia.org/wiki/Actor_model), where Actor is a Runnable executing each time when an input message arrive. Numerous Java Actor libraries exist (e.g https://github.com/rfqu/df4j).