Thread pool that binds tasks for a given ID to the same thread

前端未结

关注

 6  1872

Are there any implementations of a thread pool (in Java) that ensures all tasks for the same logical ID are executed on the same thread?

The logic I\'m after is if t

相关标签:

6条回答

鱼传尺愫

2020-12-28 17:55

I had to deal with a similar situation recently.

I ended up with a design similar to yours. The only difference was that the "current" was a map rather than a set: a map from ID to a queue of Runnables. When the wrapper around task's runnable sees that its ID is present in the map it adds the task's runnable to the ID's queue and returns immediately. Otherwise the ID is added to the map with empty queue and the task is executed.

When the task is done, the wrapper checks the ID's queue again. If the queue is not empty, the runnable is picked. Otherwise it's removed from the map and we're done.

I'll leave shutdown and cancelation as an exercise to the reader :)

0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-12-28 17:55
Our approach is similar to what is in the update of the original question. We have a wrapper class that is a runnable that contains a queue (LinkedTransferQueue) which we call a RunnableQueue. The runnable queue has the basic API of:
```
public class RunnableQueue implements Runnable
{
  public RunnableQueue(String name, Executor executor);
  public void run();

  public void execute(Runnable runnable);
}
```
When the user submits the first Runnable via the execute call the RunnableQueue enqueues itself on the executor. Subsequent calls to execute get queued up on the queue inside the RunnableQueue. When the runnable queue get executed by the ThreadPool (via its run method) it starts to "drain" the internal queue by serially executing the runnables one by one. If execute is called on the RunnableQueue while it is executing, the new runnables simply get appended to the internal queue. Once the queue is drained, the run method of the runnable queue completes and it "leaves" the executor pool. Rinse repeat.

We have other optimizations that do things like only let some number of runnables run (e.g. four) before the RunnableQueue re-posts itself to the executor pool.

The only really tricky bit inside and it isn't that hard) is to synchronize around when it is posted to the executor or not so that it doesn't repost, or miss when it should post.

Overall we find this to work pretty well. The "ID" (semantic context) for us is the runnable queue. The need we have (i.e. a plugin) has a reference to the RunnableQueue and not the executor pool so it is forced to work exclusively through the RunnableQueue. This not only guarantees all accesses are serially sequence (thread confinement) but lets the RunnableQueue "moderate" the plugin's job loading. Additionally, it requires no centralized management structure or other points of contention.
0 讨论(0)
发布评论:

提交评论
- 加载中...

我在风中等你

2020-12-28 17:56

I have to implement a similar solution and the suggestion of creating an array of executor services by h22 seems the best approach to me with one caveat that I will be taking the modulus % of the ID (either the raw ID assuming it is long/int or the hash code) relative to some desired max size and using that result as the new ID so that way I can have a balance between ending up with way too many executor service objects while still getting a good amount of concurrency in the processing.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ExecutorServiceRouter {

    private List<ExecutorService> services;
    private int size;

    public ExecutorServiceRouter(int size) {
        services = new ArrayList<ExecutorService>(size);
        this.size = size;
        for (int i = 0; i < size; i++) {
            services.add(Executors.newSingleThreadExecutor());
        }
    }

    public void route(long id, Runnable r) {
        services.get((int) (id % size)).execute(r);
    }

    public void shutdown() {
        for (ExecutorService service : services) {
            service.shutdown();
        }
    }

}

0 讨论(0)

春和景丽

2020-12-28 17:59

Create an array of executor services running one thread each and assign your queue entries to them by the hash code of your item id. The array can be of any size, depending on how many threads at most do you want to use.

This will restrict that we can use from the executor service but still allows to use its capability to shut down the only thread when no longer needed (with allowCoreThreadTimeOut(true)) and restart it as required. Also, all queuing stuff will work without rewriting it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2020-12-28 18:08
The simplest idea could be this:

Have a fixed map of BlockingQueues. Use hash mechanism to pick a queue based on task id. The hash algorithm should pick the same queue for the same ids. Start one single thread for every queue. every thread will pick one task from it's own dedicated queue and execute it.

p.s. the appropriate solution is strongly depends on the type of work you assign to threads

UPDATE

Ok, how about this crazy idea, please bear with me :)

Say, we have a ConcurrentHashMap which holds references id -> OrderQueue
```
ID1->Q1, ID2->Q2, ID3->Q3, ...
```
Meaning that now every id is associated with it's own queue. OrderQueue is a custom blocking-queue with an additional boolean flag - isAssociatedWithWorkingThread.

There is also a regular BlockingQueue which we will call amortizationQueue for now, you'll see it's use later.

Next, we have N working threads. Every working thread has it's own working queue which is a BlockingQueue containing ids associated with this thread.

When a new id comes, we do the following:
```
create a new OrderQueue(isAssociatedWithWorkingThread=false)
put the task to the queue
put id->OrderQueue to the map
put this OrderQueue to amortizationQueue
```
When an update for existing id comes we do the following:
```
pick OrderQueue from the map
put the task to the queue
if isAssociatedWithWorkingThread == false
    put this OrderQueue to amortizationQueue
```
Every working thread does the following:
```
take next id from the working queue
take the OrderQueue associated with this id from the map
take all tasks from this queue
execute them
mark isAssociatedWithWorkingThread=false for this OrderQueue
put this OrderQueue to amortizationQueue
```
Pretty straightforward. Now to the fun part - work stealing :)

If at some point of time some working thread finds itself with empty working queue, then it does the following:
```
go to the pool of all working threads
pick one (say, one with the longest working queue)
steal id from *the tail* of that thread's working queue
put this id to it's own working queue
continue with regular execution
```
And there also +1 additional thread which provides amortization work:
```
while (true)
    take next OrderQueue from amortizationQueue
    if queue is not empty and isAssociatedWithWorkingThread == false
         set isAssociatedWithWorkingThread=true
         pick any working thread and add the id to it's working queue
```
Will have to spend more time thinking if you can get away with AtomicBoolean for isAssociatedWithWorkingThread flag or there is a need to make it blocking operation to check/change this flag.
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-12-28 18:19
Extending ThreadPoolExecutor would be quite difficult. I would suggest you to go for a producer-consumer system. Here is what I am suggesting.
1. You can create typical producer consumer systems . Check out the code mentioned in this question.
2. Now each of these system will have a queue and a Single Consumer thread,which will process the tasks in the queue serially
3. Now, create a pool of such individual systems.
4. When you submit a task for a related ID , see if there is already a system marked for that related ID which is currently processing the tasks, if yes then submit the tasks,
5. If its not processing any tasks then mark that system with this new related ID and submit the task.
6. This way a single system will cater only for one logical related IDs .
Here I am assuming that a related ID is logical bunch of individual IDs and the producer consumer systems will be created for related IDs and NOT individual IDs.
0 讨论(0)
发布评论:

提交评论
- 加载中...