ExecutorService's surprising performance break-even point — rules of thumb?

后端未结

关注

 9  1967

I\'m trying to figure out how to correctly use Java\'s Executors. I realize submitting tasks to an ExecutorService has its own overhead. However, I\'m surpris

相关标签:

9条回答

无人及你

2020-12-08 03:34

You need to somehow group execution, in order to submit larger portions of computation to each thread (e.g. build groups based on stock symbol). I got best results in similar scenarios by using the Disruptor. It has a very low per-job overhead. Still its important to group jobs, naive round robin usually creates many cache misses.

see http://java-is-the-new-c.blogspot.de/2014/01/comparision-of-different-concurrency.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-12-08 03:39

The Fixed ThreadPool's ultimate porpose is to reuse already created threads. So the performance gains are seen in the lack of the need to recreate a new thread every time a task is submitted. Hence the stop time must be taken inside the submitted task. Just with in the last statement of the run method.

0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-08 03:43
This is not a fair test for the thread pool for following reasons,
1. You are not taking advantage of the pooling at all because you only have 1 thread.
2. The job is too simple that the pooling overhead can't be justified. A multiplication on a CPU with FPP only takes a few cycles.
Considering following extra steps the thread pool has to do besides object creation and the running the job,
1. Put the job in the queue
2. Remove the job from queue
3. Get the thread from the pool and execute the job
4. Return the thread to the pool
When you have a real job and multiple threads, the benefit of the thread pool will be apparent.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-08 03:43

Firstly there's a few issues with the microbenchmark. You do a warm up, which is good. However, it is better to run the test multiple times, which should give a feel as to whether it has really warmed up and the variance of the results. It also tends to be better to do the test of each algorithm in separate runs, otherwise you might cause deoptimisation when an algorithm changes.

The task is very small, although I'm not entirely sure how small. So number of times faster is pretty meaningless. In multithreaded situations, it will touch the same volatile locations so threads could cause really bad performance (use a Random instance per thread). Also a run of 47 milliseconds is a bit short.

Certainly going to another thread for a tiny operation is not going to be fast. Split tasks up into bigger sizes if possible. JDK7 looks as if it will have a fork-join framework, which attempts to support fine tasks from divide and conquer algorithms by preferring to execute tasks on the same thread in order, with larger tasks pulled out by idle threads.

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2020-12-08 03:44

Math.random() actually synchronizes on a single Random number generator. Calling Math.random() results in significant contention for the number generator. In fact the more threads you have, the slower it's going to be.

From the Math.random() javadoc:

This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.

0 讨论(0)
发布评论:

提交评论
- 加载中...

自闭症患者

2020-12-08 03:48

The 'overhead' you mention is nothing to do with ExecutorService, it is caused by multiple threads synchronizing on Math.random, creating lock contention.

So yes, you are missing something (and the 'correct' answer below is not actually correct).

Here is some Java 8 code to demonstrate 8 threads running a simple function in which there is no lock contention:

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.DoubleFunction;

import com.google.common.base.Stopwatch;

public class ExecServicePerformance {

    private static final int repetitions = 120;
    private static int totalOperations = 250000;
    private static final int cpus = 8;
    private static final List<Batch> batches = batches(cpus);

    private static DoubleFunction<Double> performanceFunc = (double i) -> {return Math.sin(i * 100000 / Math.PI); };

    public static void main( String[] args ) throws InterruptedException {

        printExecutionTime("Synchronous", ExecServicePerformance::synchronous);
        printExecutionTime("Synchronous batches", ExecServicePerformance::synchronousBatches);
        printExecutionTime("Thread per batch", ExecServicePerformance::asynchronousBatches);
        printExecutionTime("Executor pool", ExecServicePerformance::executorPool);

    }

    private static void printExecutionTime(String msg, Runnable f) throws InterruptedException {
        long time = 0;
        for (int i = 0; i < repetitions; i++) {
            Stopwatch stopwatch = Stopwatch.createStarted();
            f.run(); //remember, this is a single-threaded synchronous execution since there is no explicit new thread
            time += stopwatch.elapsed(TimeUnit.MILLISECONDS);
        }
        System.out.println(msg + " exec time: " + time);
    }    

    private static void synchronous() {
        for ( int i = 0; i < totalOperations; i++ ) {
            performanceFunc.apply(i);
        }
    }

    private static void synchronousBatches() {      
        for ( Batch batch : batches) {
            batch.synchronously();
        }
    }

    private static void asynchronousBatches() {

        CountDownLatch cb = new CountDownLatch(cpus);

        for ( Batch batch : batches) {
            Runnable r = () ->  { batch.synchronously(); cb.countDown(); };
            Thread t = new Thread(r);
            t.start();
        }

        try {
            cb.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }        
    }

    private static void executorPool() {

        final ExecutorService es = Executors.newFixedThreadPool(cpus);

        for ( Batch batch : batches ) {
            Runnable r = () ->  { batch.synchronously(); };
            es.submit(r);
        }

        es.shutdown();

        try {
            es.awaitTermination( 10, TimeUnit.SECONDS );
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        } 

    }

    private static List<Batch> batches(final int cpus) {
        List<Batch> list = new ArrayList<Batch>();
        for ( int i = 0; i < cpus; i++ ) {
            list.add( new Batch( totalOperations / cpus ) );
        }
        System.out.println("Batches: " + list.size());
        return list;
    }

    private static class Batch {

        private final int operationsInBatch;

        public Batch( final int ops ) {
            this.operationsInBatch = ops;
        }

        public void synchronously() {
            for ( int i = 0; i < operationsInBatch; i++ ) {
                performanceFunc.apply(i);
            }
        }
    }


}

Result timings for 120 tests of 25k operations (ms):

Synchronous exec time: 9956
Synchronous batches exec time: 9900
Thread per batch exec time: 2176
Executor pool exec time: 1922

Winner: Executor Service.

0 讨论(0)

1 2 下一页