multiprocessing queue full

前端 未结 2 1983
悲哀的现实
悲哀的现实 2021-02-13 15:51

I\'m using concurrent.futures to implement multiprocessing. I am getting a queue.Full error, which is odd because I am only assigning 10 jobs.

A_list = [np.rando         


        
2条回答
  •  孤独总比滥情好
    2021-02-13 16:20

    I've recently stumbled upon this, while debugging a python3.6 program which sends various GBs of data over the pipes. This is what I found (hoping it could save someone else's time!).

    Like skrrgwasme said, if the queue manager is unable to acquire a semaphore while sending a poison pill, it raises a queue Full error. The acquire call to the semaphore is non-blocking and it causes the manager to fail (it's unable to send a 'control' command due to data and control flow sharing the same Queue). Note that the links above refer to python 3.6.0

    Now I was wondering why my queue manager would send the poison pill. There must have been some other failure! Apparently some exception had happened (in some other subprocess? in the parent?), and the queue manager was trying to clean up and shut down all the subprocesses. At this point I was interested in finding this root cause.

    Debugging the root cause

    I initially tried logging all exceptions in the subprocesses but apparently no explicit error happened there. From issue 3895:

    Note that multiprocessing.Pool is also broken when a result fails at unpickle.

    it seems that the multiprocessing module is broken in py36, in that it won't catch and treat a serialization error correctly.

    Unfortunately, due to time constraints I didn't manage to replicate and verify the problem myself, preferring to jump to the action points and better programming practices (don't send all that data through pipes :). Here's a couple of ideas:

    1. Try to pickle the data supposed to run through the pipes. Due to the huge nature of my data (hundreds of GBs) and time constraints, I didn't manage to find which records were unserializable.
    2. Put a debugger into python3.6 and print the original exception.

    Action points

    1. Remodel your program to send less data through the pipes if possible.

    2. After reading issue 3895 it appears the problem arises with pickling errors. An alternative (and good programming practice) could be to transfer the data using different means. For example one could have the subprocesses write to files and return the paths to the parent process (this would be just a small string, probably a few bytes).

    3. Wait for future python versions. Apparently this was fixed on python version tag v3.7.0b3 in the context of issue 3895. The Full exception will be handled inside shutdown_worker. The current maintenance version of Python at the time of writing is 3.6.5

提交回复
热议问题