I want to launch a lot of tasks to run on a database of +-42Mio records. I want to run this in batches of 5000 records/time (results in 850 tasks). I also want to limit the
Changed from your code:
ExecutorService executorService = Executors.newFixedThreadPool(16);
for (int j = 1; j < 900 + 1; j++) {
int start = (j - 1) * 5000;
int stop = (j) * 5000 - 1;
FetcherRunner runner = new FetcherRunner(routes, start, stop);
executorService.submit(runner);
}
Is this the correct way of working?
The first part is correct. But you shouldn't be creating and starting new Thread objects. When you submit the Runnable, the ExecutorService puts it on its queue, and then runs it when a worker thread becomes available.
.... I use the threadlist to detect when all my threads are finished so I can continue processing results.
Well if you do what you are currently doing, you are running each task twice. Worse still, the swarm of manually created threads will all try to run in parallel.
A simple way to make sure that all of the tasks have completed is to call awaitTermination(...) on the ExecutorService. (An orderly shutdown of the executor service will have the same effect ... if you don't intend to use it again.)
The other approach is to create a Future
for each FetcherRunner
's results, and attempt to get
the result after all of the tasks have been submitted. That has the advantage that you can start processing early results before later ones have been produced. (However, if you don't need to ... or can't ... do that, using Futures won't achieve anything.)
The best way would be to use countdownlatch as follows
ExecutorService executorService = Executors.newFixedThreadPool(16);
CountdownLatch latch = new CountdownLatch(900);
FetcherRunner runner = new FetcherRunner(routes, start, stop, latch);
latch.await();
in the FetcherRunner under finally block use latch.countDown();
code after await()
will be executed only when all the tasks are completed.
You don't need to the part after the call to submit. The code you have that creates a Thread will result in 900 threads being created! Yowza. The ExecutorService has a pool of 16 threads and you can run 16 jobs at once. Any jobs submitted when all 16 threads are busy will be queued. From the docs:
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly shutdown.
So there is no need for yet another thread. If you need to be notified after a task has finished you can have it call out. Other options are to cache all of the Future's returned from submit, and upon each task being finished you can check to see if all Future's are done. After all Future's are finished you can dispatch another function to run. But it will run ON one of the threads in the ExecutorService.
The first part using ExecutorService looks good:
...
FetcherRunner runner = new FetcherRunner(routes, start, stop);
executorService.submit(runner);
The part with Thread should not be there, I am assuming you have it there just to show how you had it before?
Update:
Yes, you don't require the code after executorService.submit(runner)
, that is going to end up spawning a huge number of threads. If your objective is to wait for all submitted tasks to complete after the loop, then you can get a reference to Future
when submitting tasks and wait on the Future
, something like this:
ExecutorService executorService = Executors.newFixedThreadPool(16);
List<Future<Result>> futures = ..;
for (int j = 1; j < 900+ 1; j++) {
int start = (j - 1) * 5000;
int stop = (j) * 5000- 1;
FetcherRunner runner = new FetcherRunner(routes, start, stop);
futures.add(executorService.submit(runner));
}
for (Future<Result> future:futures){
future.get(); //Do something with the results..
}