How can I go about \"selecting\" on multiple queue.Queue\'s simultaneously?
Golang has the desired feature with its channels:
select {
case i1 = <-c1:
If you use queue.PriorityQueue
you can get a similar behaviour using the channel objects as priorities:
import threading, logging
import random, string, time
from queue import PriorityQueue, Empty
from contextlib import contextmanager
logging.basicConfig(level=logging.NOTSET,
format="%(threadName)s - %(message)s")
class ChannelManager(object):
next_priority = 0
def __init__(self):
self.queue = PriorityQueue()
self.channels = []
def put(self, channel, item, *args, **kwargs):
self.queue.put((channel, item), *args, **kwargs)
def get(self, *args, **kwargs):
return self.queue.get(*args, **kwargs)
@contextmanager
def select(self, ordering=None, default=False):
if default:
try:
channel, item = self.get(block=False)
except Empty:
channel = 'default'
item = None
else:
channel, item = self.get()
yield channel, item
def new_channel(self, name):
channel = Channel(name, self.next_priority, self)
self.channels.append(channel)
self.next_priority += 1
return channel
class Channel(object):
def __init__(self, name, priority, manager):
self.name = name
self.priority = priority
self.manager = manager
def __str__(self):
return self.name
def __lt__(self, other):
return self.priority < other.priority
def put(self, item):
self.manager.put(self, item)
if __name__ == '__main__':
num_channels = 3
num_producers = 4
num_items_per_producer = 2
num_consumers = 3
num_items_per_consumer = 3
manager = ChannelManager()
channels = [manager.new_channel('Channel#{0}'.format(i))
for i in range(num_channels)]
def producer_target():
for i in range(num_items_per_producer):
time.sleep(random.random())
channel = random.choice(channels)
message = random.choice(string.ascii_letters)
logging.info('Putting {0} in {1}'.format(message, channel))
channel.put(message)
producers = [threading.Thread(target=producer_target,
name='Producer#{0}'.format(i))
for i in range(num_producers)]
for producer in producers:
producer.start()
for producer in producers:
producer.join()
logging.info('Producers finished')
def consumer_target():
for i in range(num_items_per_consumer):
time.sleep(random.random())
with manager.select(default=True) as (channel, item):
if channel:
logging.info('Received {0} from {1}'.format(item, channel))
else:
logging.info('No data received')
consumers = [threading.Thread(target=consumer_target,
name='Consumer#{0}'.format(i))
for i in range(num_consumers)]
for consumer in consumers:
consumer.start()
for consumer in consumers:
consumer.join()
logging.info('Consumers finished')
Example output:
Producer#0 - Putting x in Channel#2
Producer#2 - Putting l in Channel#0
Producer#2 - Putting A in Channel#2
Producer#3 - Putting c in Channel#0
Producer#3 - Putting z in Channel#1
Producer#1 - Putting I in Channel#1
Producer#1 - Putting L in Channel#1
Producer#0 - Putting g in Channel#1
MainThread - Producers finished
Consumer#1 - Received c from Channel#0
Consumer#2 - Received l from Channel#0
Consumer#0 - Received I from Channel#1
Consumer#0 - Received L from Channel#1
Consumer#2 - Received g from Channel#1
Consumer#1 - Received z from Channel#1
Consumer#0 - Received A from Channel#2
Consumer#1 - Received x from Channel#2
Consumer#2 - Received None from default
MainThread - Consumers finished
In this example, ChannelManager
is just a wrapper around queue.PriorityQueue
that implements the select
method as a contextmanager
to make it look similar to the select
statement in Go.
A few things to note:
Ordering
In the Go example, the order in which the channels are written inside the select
statement determines which channel's code will be executed if there's data available for more than one channel.
In the python example the order is determined by the priority assigned to each channel. However, the priority can be dinamically assigned to each channel (as seen in the example), so changing the ordering would be possible with a more complex select
method that takes care of assigning new priorities based on an argument to the method. Also, the old ordering could be reestablished once the context manager is finished.
Blocking
In the Go example, the select
statement is blocking if a default
case exists.
In the python example, a boolean argument has to be passed to the select
method to make it clear when blocking/non-blocking is desired. In the non-blocking case, the channel returned by the context mananager is just the string 'default'
so it's easy in the code inside to detect this in the code inside the with
statement.
Threading: Object in the queue
module are already ready for multi-producer, multiconsumer-scenarios as already seen in the example.
There are many different implementations of producer-consumer queues, like queue.Queue available. They normally differ in a lot of properties like listed on this excellent article by Dmitry Vyukov. As you can see, there are more than 10k different combinations possible. The algorithms used for such queues also differ widely depending on the requirements. It's not possible to just extend an existing queue algorithm to guarantee additional properties, since that normally requires different internal data structures and different algorithms.
Go's channels offer a relatively high number of guaranteed properties, so those channels might be suitable for a lot of programs. One of the hardest requirements there is the support for reading / blocking on multiple channels at once (select statement) and to choose a channel fairly if more than one branch in a select statement is able to proceed, so that no messages will be left behind. Python's queue.Queue doesn't offer this features, so it's simply not possible to archive the same behavior with it.
So, if you want to continue using queue.Queue you need to find workarounds for that problem. The workarounds have however their own list of drawbacks and are harder to maintain. Looking for another producer-consumer queue which offers the features you need might be a better idea! Anyway, here are two possible workarounds:
Polling
while True:
try:
i1 = c1.get_nowait()
print "received %s from c1" % i1
except queue.Empty:
pass
try:
i2 = c2.get_nowait()
print "received %s from c2" % i2
except queue.Empty:
pass
time.sleep(0.1)
This might use a lot of CPU cycles while polling the channels and might be slow when there are a lot of messages. Using time.sleep() with an exponential back-off time (instead of the constant 0.1 secs shown here) might improve this version drastically.
A single notify-queue
queue_id = notify.get()
if queue_id == 1:
i1 = c1.get()
print "received %s from c1" % i1
elif queue_id == 2:
i2 = c2.get()
print "received %s from c2" % i2
With this setup, you must send something to the notify queue after sending to c1 or c2. This might work for you, as long as only one such notify-queue is enough for you (i.e. you do not have multiple "selects", each blocking on a different subset of your channels).
Alternatively you can also consider using Go. Go's goroutines and concurrency support is much more powerful than Python's limited threading capabilities anyway.
The pychan project duplicates Go channels in Python, including multiplexing. It implements the same algorithm as Go, so it meets all of your desired properties:
Here's what your example would look like:
c1 = Chan(); c2 = Chan(); c3 = Chan()
try:
chan, value = chanselect([c1, c3], [(c2, i2)])
if chan == c1:
print("Received %r from c1" % value)
elif chan == c2:
print("Sent %r to c2" % i2)
else: # c3
print("Received %r from c3" % value)
except ChanClosed as ex:
if ex.which == c3:
print("c3 is closed")
else:
raise
(Full disclosure: I wrote this library)
from queue import Queue
# these imports needed for example code
from threading import Thread
from time import sleep
from random import randint
class MultiQueue(Queue):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.queues = []
def addQueue(self, queue):
queue.put = self._put_notify(queue, queue.put)
queue.put_nowait = self._put_notify(queue, queue.put_nowait)
self.queues.append(queue)
def _put_notify(self, queue, old_put):
def wrapper(*args, **kwargs):
result = old_put(*args, **kwargs)
self.put(queue)
return result
return wrapper
if __name__ == '__main__':
# an example of MultiQueue usage
q1 = Queue()
q1.name = 'q1'
q2 = Queue()
q2.name = 'q2'
q3 = Queue()
q3.name = 'q3'
mq = MultiQueue()
mq.addQueue(q1)
mq.addQueue(q2)
mq.addQueue(q3)
queues = [q1, q2, q3]
for i in range(9):
def message(i=i):
print("thread-%d starting..." % i)
sleep(randint(1, 9))
q = queues[i%3]
q.put('thread-%d ending...' % i)
Thread(target=message).start()
print('awaiting results...')
for _ in range(9):
result = mq.get()
print(result.name)
print(result.get())
Rather than try to use the .get()
method of several queues, the idea here is to have the queues notify the MultiQueue
when they have data ready -- sort of a select
in reverse. This is achieved by having MultiQueue
wrap the various Queue
's put()
and put_nowait()
methods so that when something is added to those queues, that queue is then put()
into the the MultiQueue
, and a corresponding MultiQueue.get()
will retrieve the Queue
that has data ready.
This MultiQueue
is based on the FIFO Queue, but you could also use the LIFO or Priority queues as the base depending on your needs.