I have a question understanding the queue in the multiprocessing
module in python 3
This is what they say in the programming guidelines:
The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.
OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put()
operation until some other process uses get()
to retrieve data from the queue.
For small amounts of data, such as the one in your example, the main process can join()
all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.
But it will certainly break with large amounts of data. The subprocess will be blocked in put()
waiting for the main process to remove some data from the queue with get()
, but the main process is blocked in join()
waiting for the subprocess to finish. This results in a deadlock.
Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.
Don't call join()
on a process object before you got all messages from the shared queue.
I used following workaround to allow processes to exit before processing all its results:
results = []
while True:
try:
result = resultQueue.get(False, 0.01)
results.append(result)
except queue.Empty:
pass
allExited = True
for t in processes:
if t.exitcode is None:
allExited = False
break
if allExited & resultQueue.empty():
break
It can be shortened but I left it longer to be more clear for newbies.
Here resultQueue
is the multiprocess.Queue
that was shared with multiprocess.Process
objects. After this block of code you will get the result
array with all the messages from the queue.
The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:
multiprocessing.connection.BUFFER
size (not so good)