I\'m trying to figure out how to correctly use Java\'s Executors. I realize submitting tasks to an ExecutorService
has its own overhead. However, I\'m surpris
You need to somehow group execution, in order to submit larger portions of computation to each thread (e.g. build groups based on stock symbol). I got best results in similar scenarios by using the Disruptor. It has a very low per-job overhead. Still its important to group jobs, naive round robin usually creates many cache misses.
see http://java-is-the-new-c.blogspot.de/2014/01/comparision-of-different-concurrency.html
The Fixed ThreadPool's ultimate porpose is to reuse already created threads. So the performance gains are seen in the lack of the need to recreate a new thread every time a task is submitted. Hence the stop time must be taken inside the submitted task. Just with in the last statement of the run method.
This is not a fair test for the thread pool for following reasons,
Considering following extra steps the thread pool has to do besides object creation and the running the job,
When you have a real job and multiple threads, the benefit of the thread pool will be apparent.
Firstly there's a few issues with the microbenchmark. You do a warm up, which is good. However, it is better to run the test multiple times, which should give a feel as to whether it has really warmed up and the variance of the results. It also tends to be better to do the test of each algorithm in separate runs, otherwise you might cause deoptimisation when an algorithm changes.
The task is very small, although I'm not entirely sure how small. So number of times faster is pretty meaningless. In multithreaded situations, it will touch the same volatile locations so threads could cause really bad performance (use a Random
instance per thread). Also a run of 47 milliseconds is a bit short.
Certainly going to another thread for a tiny operation is not going to be fast. Split tasks up into bigger sizes if possible. JDK7 looks as if it will have a fork-join framework, which attempts to support fine tasks from divide and conquer algorithms by preferring to execute tasks on the same thread in order, with larger tasks pulled out by idle threads.
Math.random() actually synchronizes on a single Random number generator. Calling Math.random() results in significant contention for the number generator. In fact the more threads you have, the slower it's going to be.
From the Math.random() javadoc:
This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.
The 'overhead' you mention is nothing to do with ExecutorService, it is caused by multiple threads synchronizing on Math.random, creating lock contention.
So yes, you are missing something (and the 'correct' answer below is not actually correct).
Here is some Java 8 code to demonstrate 8 threads running a simple function in which there is no lock contention:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.DoubleFunction;
import com.google.common.base.Stopwatch;
public class ExecServicePerformance {
private static final int repetitions = 120;
private static int totalOperations = 250000;
private static final int cpus = 8;
private static final List<Batch> batches = batches(cpus);
private static DoubleFunction<Double> performanceFunc = (double i) -> {return Math.sin(i * 100000 / Math.PI); };
public static void main( String[] args ) throws InterruptedException {
printExecutionTime("Synchronous", ExecServicePerformance::synchronous);
printExecutionTime("Synchronous batches", ExecServicePerformance::synchronousBatches);
printExecutionTime("Thread per batch", ExecServicePerformance::asynchronousBatches);
printExecutionTime("Executor pool", ExecServicePerformance::executorPool);
}
private static void printExecutionTime(String msg, Runnable f) throws InterruptedException {
long time = 0;
for (int i = 0; i < repetitions; i++) {
Stopwatch stopwatch = Stopwatch.createStarted();
f.run(); //remember, this is a single-threaded synchronous execution since there is no explicit new thread
time += stopwatch.elapsed(TimeUnit.MILLISECONDS);
}
System.out.println(msg + " exec time: " + time);
}
private static void synchronous() {
for ( int i = 0; i < totalOperations; i++ ) {
performanceFunc.apply(i);
}
}
private static void synchronousBatches() {
for ( Batch batch : batches) {
batch.synchronously();
}
}
private static void asynchronousBatches() {
CountDownLatch cb = new CountDownLatch(cpus);
for ( Batch batch : batches) {
Runnable r = () -> { batch.synchronously(); cb.countDown(); };
Thread t = new Thread(r);
t.start();
}
try {
cb.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void executorPool() {
final ExecutorService es = Executors.newFixedThreadPool(cpus);
for ( Batch batch : batches ) {
Runnable r = () -> { batch.synchronously(); };
es.submit(r);
}
es.shutdown();
try {
es.awaitTermination( 10, TimeUnit.SECONDS );
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static List<Batch> batches(final int cpus) {
List<Batch> list = new ArrayList<Batch>();
for ( int i = 0; i < cpus; i++ ) {
list.add( new Batch( totalOperations / cpus ) );
}
System.out.println("Batches: " + list.size());
return list;
}
private static class Batch {
private final int operationsInBatch;
public Batch( final int ops ) {
this.operationsInBatch = ops;
}
public void synchronously() {
for ( int i = 0; i < operationsInBatch; i++ ) {
performanceFunc.apply(i);
}
}
}
}
Result timings for 120 tests of 25k operations (ms):
Winner: Executor Service.