How to decide on the ThreadPoolTaskExecutor pools and queue sizes?

后端未结

关注

 1  1232

时光说笑 2021-02-01 07:15

This is may be more general question, on how to decide on the thread pool size, but let\'s use the Spring ThreadPoolTaskExecutor for this case. I have the following

1条回答

一向 (楼主)

2021-02-01 07:44
Update : This answer got a few votes over the years so I'm adding a shortened version for people who don't have the time to read my weird metaphor :

TL;DR answer :

The actual constraint is that a (logical) CPU core can only run a single thread at the same time. Thus :
- Number of core : Number of logical core of your CPUs * 1/(ratio_of_time_your_thread_is_runnable_when_doing_your_task)
So, if you have 8 logical cores on your machine, you can safely put 8 threads in your threadPool (well, remember to exclude the other threads that may be used). Then you need to ask yourself if you can put more : you need to benchmark the kind of task you intend to run on your threadpool : if you notice the thread are, on average running only 50% of the time, that means your CPU is free to go work on another thread 50% of its time and you can add more threads.
- Queue size : as many as you can wait on.
The queue size is the number of items your threadPool will accept before rejecting them. It is business logic. It depends on what behavior you expect : is there a point accepting a billion tasks ? When do you throw the towel ? If one task takes one second to complete, and you have 10 threads, that means that the 10,000th task in queue will hopefully be done in 1000 seconds. Is that acceptable ? The worst thing to happen is having clients timeout and re-submit the same tasks before you could complete the firsts.

Original ELI12 answer :

It may not be the most accurate answer, but I'll try :

A simple approach is to be aware that your 2-core CPU will only work on two threads at the same time.

If you have relatively modern Intel CPU, and you have Hyper Threading (aka. HT(TM), HTT(TM), SMT) turned on (via setting in BIOS), your operating system will see the number of available cores as double the number of the physical cores within your CPU.

Either way, from Java to detect how many cores (or simultaneous not-preempting each other threads) you can work with, just call int cores = Runtime.getRuntime().availableProcessors();

If you try to see your application as a Workshop (an actual one) :
- A processor would be represented by an employee. It is the physical unit that will add value to a product.
- A task would be a lump of raw material (plus some instructions list)
- Your thread is a desk on which the employee can put the task on and work.
- The queue size is the length of the conveyor belt that brings the raw materials to the desk.
Thus, your question becomes "How can I choose how many desks and how long can my conveyor belt be inside my factory, given an unchanging number of employees ?".

For the how many desks (Threads) part :

An employee can only work at one desk at a time, and you can only have a single employee per desk. Thus, the basic setup would be to have at least as many desks as you have employees (to avoid having any employee (Processor) left out without any possibility to work.

But, depending on your activity, you may afford more desks per employee :

If your employees are expected to put mail inside enveloppes constantly, an operation that require their full attention (in programing : sorting collections, creating objects, incrementing counters), having more desks wouldn't help, and may even be detrimental because your employee would have to sometime change desk (switching context, which takes some time), thus leaving the one they were working on, to make work progress on the other.

But, if your task is making pottery, and relies on your employee waiting for the clay to cook in an oven (understand getting access to external resource, such as a file system, a web service etc), your employee can afford to go model clay on another desk and get back to the first one later.

Thus, you can afford more desks per employee as long as your task have a active work/waiting ratio (running/waiting) big enough. And the number of desks being how many tasks can your employee make progress on during the waiting time.

For the conveyor belt (queue) size part :

The queue size represents how many item you are allowing to be queued before starting to reject any more task (by throwing an exception), thus being the threshold at which you start to tell "ok, I'm already overbooked and won't ever be able to comply"

First, I'd say your conveyer belt needs to fit inside the workshop. Meaning that the collection should be small enough to prevent out of memory errors (obviously).

After that, it is based on your company policy. Let's assume a task is added to the belt every time a client makes an order (another service call your API). If the caller doesn't care how much time you take to comply and trust you enough with the execution, there's no point in limiting the size of the belt.

But if you can expect that your client gets annoyed after waiting for their pottery for a month, and leaves you for a concurrent or reorder another pottery, assuming the first order was lost and won't be bothered to ever check if the first order was completed... That first order was done for nothing, you won't get payed, and if your client makes another order whenever you're too slow to comply, you'll enter in a feedback loop because every new order will slow down the whole process.

Thus, in that case, you should put up a sign telling your client "sorry, we're overbooked, you shouldn't make any new order now, as we won't be able to comply within an acceptable time range".

Then, the queue size would be : acceptable time range / time to complete a task.

Concrete Example : if your client service expects that the task it submits would have to be completed in less than 100 seconds, and knowing that every task takes 1-2 seconds, you should limit the queue to 50-100 tasks because once you have 100 tasks waiting in the queue, you're pretty sure that the next one won't be completed in less than 100 seconds, thus rejecting the task to prevent the service from waiting for nothing.
0 讨论(0)
发布评论:

提交评论
- 加载中...