Sharing many queues among processes in Python

前端 未结 1 2066
青春惊慌失措
青春惊慌失措 2020-12-09 18:21

I am aware of multiprocessing.Manager() and how it can be used to create shared objects, in particular queues which can be shared between workers. There is this

相关标签:
1条回答
  • 2020-12-09 18:35

    It sounds like your issues started when you tried to share a multiprocessing.Queue() by passing it as an argument. You can get around this by creating a managed queue instead:

    import multiprocessing
    manager = multiprocessing.Manager()
    passable_queue = manager.Queue()
    

    When you use a manager to create it, you are storing and passing around a proxy to the queue, rather than the queue itself, so even when the object you pass to your worker processes is a copied, it will still point at the same underlying data structure: your queue. It's very similar (in concept) to pointers in C/C++. If you create your queues this way, you will be able to pass them when you launch a worker process.

    Since you can pass queues around now, you no longer need your dictionary to be managed. Keep a normal dictionary in main that will store all the mappings, and only give your worker processes the queues they need, so they won't need access to any mappings.

    I've written an example of this here. It looks like you are passing objects between your workers, so that's what's done here. Imagine we have two stages of processing, and the data both starts and ends in the control of main. Look at how we can create the queues that connect the workers like a pipeline, but by giving them only they queues they need, there's no need for them to know about any mappings:

    import multiprocessing as mp
    
    def stage1(q_in, q_out):
    
        q_out.put(q_in.get()+"Stage 1 did some work.\n")
        return
    
    def stage2(q_in, q_out):
    
        q_out.put(q_in.get()+"Stage 2 did some work.\n")
        return
    
    def main():
    
        pool = mp.Pool()
        manager = mp.Manager()
    
        # create managed queues
        q_main_to_s1 = manager.Queue()
        q_s1_to_s2 = manager.Queue()
        q_s2_to_main = manager.Queue()
    
        # launch workers, passing them the queues they need
        results_s1 = pool.apply_async(stage1, (q_main_to_s1, q_s1_to_s2))
        results_s2 = pool.apply_async(stage2, (q_s1_to_s2, q_s2_to_main))
    
        # Send a message into the pipeline
        q_main_to_s1.put("Main started the job.\n")
    
        # Wait for work to complete
        print(q_s2_to_main.get()+"Main finished the job.")
    
        pool.close()
        pool.join()
    
        return
    
    if __name__ == "__main__":
        main()
    

    The code produces this output:

    Main started the job.
    Stage 1 did some work.
    Stage 2 did some work.
    Main finished the job.

    I didn't include an example of storing the queues or AsyncResults objects in dictionaries, because I still don't quite understand how your program is supposed to work. But now that you can pass your queues freely, you can build your dictionary to store the queue/process mappings as needed.

    In fact, if you really do build a pipeline between multiple workers, you don't even need to keep a reference to the "inter-worker" queues in main. Create the queues, pass them to your workers, then only retain references to queues that main will use. I would definitely recommend trying to let old queues be garbage collected as quickly as possible if you really do have "an arbitrary number" of queues.

    0 讨论(0)
提交回复
热议问题