I need to spawn N consumer threads, which process same InputStream concurrently, e.g - transform it somehow, calculate checksum or digital signature etc. These consumers do not
The way I see it, you should have at least some kind of buffering so different consumers can move through the stream at different pace without everything being constantly bogged down by the currently slowest consumer. That basically ensures worst-case performance and very little benefit of concurrency.
You could, say, tag each chunk with the consumers that have used it so far and then delete those that are completely used up. Maybe this could be achieved by each consumer holding a reference to each chunk it hasn't yet used, which would allow GC to automatically take care of used chunks. The producer might keep a list of WeakReference
s to the chunks so it has a handle on the number of chunks yet to be used and base its throttling on that.
I am also thinking about having a separate InputStream
instance per thread, which internally communicates with the producer InputStream
. This way you have an easy solution for your livelock hazard: try ... finally { is.close(); }
-- the dying consumer closes its own inputstream. This is communicated to the producer.
I have some ideas with using an ArrayBlockingQueue
per consumer. There would be some difficulty in ensuring that all consumers are properly fed, without making the producer either block or busy-wait.
Have you considered using pipe streams? Your producer can have a one or more PipedOuputStream in which it throws whatever it reads from the file. At the other side of the pipes, you have different consumer threads reading on a corresponding PipedInputstream (which is an InputStream that you can share with your libraries).
Your producer thread can decide through which of the pipes data should be sent, by means of this, providing data to be processed for a given consumer thread reading on the other side of the pipe.
If you need to get data back from your consumer threads, then you can create another pipe, in the opposite direction, to send the data back to you.
You can try some Java Messaging Service (JMS) implementation like Apache ActiveMQ.
In your case you'd need to create a so called Topic (see Topics vs. Queues). A topic is created by the producer, and is published to N consumers, which may run concurrently, with each consumer receiving exactly the same data.
Since you want to use InputStream
s there is a chapter on how to send messages are streams.
I suppose, typically, producers and consumers would be separate processes, probably running on different machines on the network. I think you can configure it to run completely in a single JVM, though. This would depend on the implementation of JMS. These are also quite famous: HornetQ by JBoss, RabbitMQ, and a whole bunch of others.