We are working on an application where a set of objects can be affected by receiving messages from 3 different sources. Each message (from any of the sources) has a single objec
In general, approaches like this are a bad idea. It falls under the "don't optimize early" mantra.
Further, if implemented your idea may harm your performance, not help it. One simple example of where it wouldn't work well is if you suddenly got a lot of requests on one type - the other worker thread would be idle.
The best approach is to use a standard producer-consumer pattern and tune the number of consumer threads by system testing under various loads - ideally by feeding in a recording of real-life transactions.
The "go to" framework for these situation are classes from the java.util.concurrent package. I recommend using a BlockingQueue (proably an ArrayBlockingQueue) with an ExecutorService created from one of the Executors factory methods, probably newCachedThreadPool().
Once you have implemented and system tested that, if you find proven performance problems, then analyse your system, find the bottleneck and fix it.
The reason you shouldn't optimize early is that most times the problems are not where you expect them to be
As an alternative approach: I would recommend using an existing framework, such as RabbitMQ or ActiveMQ for this. Trying to invent your own messaging framework can be a challenge. If you are trying to add value with your own framework, that's one thing. If you simply need one to accomplish your goals, that's another. These frameworks have come up with many options for optimal message delivery and would be worth considering.
My answers are:
Some explanations:
I don't think that these premises are contradicting the purpose of the ThreadPool, which is just about associating tasks to threads. In this model though, the Threadpool would associate threads to tasks only once, and then threads would keep running to poll their input message queue.
The friction spots of the threads should be the intermediate messages queues, and perhaps other resources related to the processing of these messages. Following your explanations, I suppose that you plan to reduce the second kind to a minimum by cleverly partitioning the message processing to the tasks. Each queue should only be accessed by the partitioning task and the processing task associated to the queue, so it should be minimal.
[Is] dedicating worker threads to a particular set of objects is a better/faster approach?
I assume the overall goals is to trying to maximize the concurrent processing of these inbound messages. You have receivers from the 3 sources, that need to put the messages in a pool that will be optimally handled. Because messages from any of the 3 sources may deal with the same target object which cannot be processed simultaneously, you want someway to divide up your messages so they can be processed concurrently but only if they are guaranteed to not refer to the same target object.
I would implement the hashCode()
method on your target object (maybe just name.hashCode()
) and then use the value to put the objects into an array of BlockingQueue
s, each with a single thread consuming them. Using an array of Executors.newSingleThreadExecutor()
would be fine. Mod the hash value mode by the number of queues and put it in that queue. You will need to pre-define the number of processors to maximum. Depends on how CPU intensive the processing is.
So something like the following code should work:
private static final int NUM_PROCESSING_QUEUES = 6;
...
ExecutorService[] pools = new ExecutorService[NUM_PROCESSING_QUEUES];
for (int i = 0; i < pools.length; i++) {
pools[i] = Executors.newSingleThreadExecutor();
}
...
// receiver loop:
while (true) {
Message message = receiveMessage();
int hash = Math.abs(message.hashCode());
// put each message in the appropriate pool based on its hash
// this assumes message is runnable
pools[hash % pools.length].submit(message);
}
One of the benefits of this mechanism is that you may be able to limit the synchronization about the target objects. You know that the same target object will only be updated by a single thread.
Do people agree with the assumption that dedicating worker threads to a particular set of objects is a better/faster approach?
Yes. That seems the right way to get optimal concurrency.
Assuming this is a better approach, do the existing Java ThreadPool classes have a way to support this? Or does it require us coding our own ThreadPool implementation?
I don't know of any thread-pool which accomplishes this. I would not write your own implementation however. Just use them like the code outlines above.
You should be able to provide a special BlockingQueue for ThreadPoolExecutor. The queue remembers which type of message is being processed by which thread, so that it can withhold all messages of the same type.
MyQueue
ownership relation of thread - msgType
take/poll()
if current thread owns msg type X
if there is a message of type X
return that message
else
give up ownership
// current thread does not own any message type
if there is a messsage of type Y, Y is not owned by any thread
current thread owns Y
return that message
// there's no message belonging to an unowned type
wait then retry