Generator function of child processes runs in the Parent process

后端 未结 2 512
后悔当初
后悔当初 2021-01-27 04:24

I am trying to run a generator process in parallel by child processes. But when I tried to do this, I see the function with generator was processed by the parent process!!!

相关标签:
2条回答
  • 2021-01-27 05:15

    I expect that speaking about processing with generators you really want next things to be accomplished:

    1. Main process generates some tasks lazily through some generator, tasks are represented by some data (arg).
    2. These tasks might be generated by generator very slowly, e.g. by fetching chunks of data from Internet, hence should be processed one by one as soon as they are ready to be processed.
    3. Main process sends these tasks to several child processes to be processed.
    4. Processing in children might also take slow and random amount of time.
    5. Children should report some results (sucessfully processed result data or encoded error in case of failure).
    6. Main process should gather all results also lazily, i.e. as soon as they are ready reports them.
    7. Results inside main process could be gathered in strictly same order as generated (strict order True) or in arbitrary order as soon as they are processed (strict order False), second variant may be considerably faster.
    8. All CPU cores should be used for efficiency, one process per core.

    For all these purposes I created example template code that you can use for your specific problem:

    Try it online!

    def ProcessTask(arg):
        import time, os
        print('Started task', arg[0], arg[1], 'by', os.getpid())
        time.sleep(arg[1])
        print('Finished task', arg[0], arg[1], 'by', os.getpid())
        return (arg[0], arg[1] * 2)
    
    def Main():
        import multiprocessing as mp
        
        def GenTasks(n):
            import random, os, time
            for i in range(n):
                t = round(random.random() * 2., 3)
                print('Created task', i, t, 'by', os.getpid())
                yield (i, t)
                time.sleep(random.random())
                
        num_tasks = 4
    
        for strict_order in [True, False]:
            print('\nIs strict order', strict_order)
            with mp.Pool() as pool:
                for res in (pool.imap_unordered, pool.imap)[strict_order](
                    ProcessTask, GenTasks(num_tasks)
                ):
                    print('Result from task', res)
                
    if __name__ == '__main__':
        Main()
    

    Outputs:

    Is strict order True
    Created task 0 0.394 by 10536
    Created task 1 0.357 by 10536
    Started task 0 0.394 by 8740
    Started task 1 0.357 by 5344
    Finished task 1 0.357 by 5344
    Finished task 0 0.394 by 8740
    Result from task (0, 0.788)
    Result from task (1, 0.714)
    Created task 2 0.208 by 10536
    Started task 2 0.208 by 5344
    Finished task 2 0.208 by 5344
    Result from task (2, 0.416)
    Created task 3 0.937 by 10536
    Started task 3 0.937 by 8740
    Finished task 3 0.937 by 8740
    Result from task (3, 1.874)
    
    Is strict order False
    Created task 0 1.078 by 10536
    Started task 0 1.078 by 7256
    Created task 1 0.029 by 10536
    Started task 1 0.029 by 5440
    Finished task 1 0.029 by 5440
    Result from task (1, 0.058)
    Finished task 0 1.078 by 7256
    Result from task (0, 2.156)
    Created task 2 1.742 by 10536
    Started task 2 1.742 by 5440
    Created task 3 0.158 by 10536
    Started task 3 0.158 by 7256
    Finished task 3 0.158 by 7256
    Result from task (3, 0.316)
    Finished task 2 1.742 by 5440
    Result from task (2, 3.484)
    

    PS:

    1. In the previous code and when using multiprocessing in general same single module-script is used by both main and child processes, main and children all start by executing whole script. if __name__ == '__main__': block is run only by main process, the rest of module's code is executed both by main and children.
    2. Good practice is to put everything what is needed to execute by main into one function (Main() in my case) and by children into another function (ProcessTask() in my case), and some other functions and variable into global scope that is shared and run by both main and children (I don't have anything shared in code).
    3. Processing function (ProcessTask() in my code) should be in global scope of module.
    4. Other documentation regarding multiprocessing is available here.
    0 讨论(0)
  • 2021-01-27 05:18

    One way of achieving communications between two processes is by using a Queue instance. In the following example, instead of creating two individual processes I have opted instead to create a process pool of two processes:

    from multiprocessing import Pool, Manager
    import os
    
    def p(q):
        pid = os.getpid()
        q.put(pid)
        for i in range(5):
            q.put(i)
        q.put(None) # signify "end of file"
    
    
    def main():
        manager = Manager()
        q1 = manager.Queue()
        q2 = manager.Queue()
        with Pool(2) as pool: # create a pool of 2 processes
            pool.apply_async(p, args=(q1,))
            pool.apply_async(p, args=(q2,))
            q1_eof = False
            q2_eof = False
            while not q1_eof or not q2_eof:
                if not q1_eof:
                    obj = q1.get() # blocking get
                    if obj is None:
                        q1_eof = True
                    else:
                        print(obj)
                if not q2_eof:
                    obj = q2.get() # blocking get
                    if obj is None:
                        q2_eof = True
                    else:
                        print(obj)
    
    if __name__ == '__main__':
        main()
    

    Prints:

    5588
    24104
    0
    0
    1
    1
    2
    2
    3
    3
    4
    4
    

    The code that uses explicit Process instances rather than creating a pool follows (I don't tend to subclass the Process class as that requires more coding:)

    from multiprocessing import Process, Queue
    import os
    
    def p(q):
        pid = os.getpid()
        q.put(pid)
        for i in range(5):
            q.put(i)
        q.put(None) # signify "end of file"
    
    
    def main():
        q1 = Queue()
        q2 = Queue()
        p1 = Process(target=p, args=(q1,))
        p1.start()
        p2 = Process(target=p, args=(q2,))
        p2.start()
        q1_eof = False
        q2_eof = False
        while not q1_eof or not q2_eof:
            if not q1_eof:
                obj = q1.get() # blocking get
                if obj is None:
                    q1_eof = True
                else:
                    print(obj)
            if not q2_eof:
                obj = q2.get() # blocking get
                if obj is None:
                    q2_eof = True
                else:
                    print(obj)
        p1.join()
        p2.join()
    
    if __name__ == '__main__':
        main()
    

    Important Note

    The two coding examples (one that uses a process pool and one that doesn't) uses two different types of Queue instances.

    See Python multiprocessing.Queue vs multiprocessing.manager().Queue()

    You can always use multiprocessing.manager().Queue() in all cases (I generally do) but at a possible loss of some efficiency.

    0 讨论(0)
提交回复
热议问题