I am trying to run a generator process in parallel by child processes. But when I tried to do this, I see the function with generator was processed by the parent process!!!
I expect that speaking about processing with generators you really want next things to be accomplished:
arg
).True
) or in arbitrary order as soon as they are processed (strict order False
), second variant may be considerably faster.For all these purposes I created example template code that you can use for your specific problem:
Try it online!
def ProcessTask(arg):
import time, os
print('Started task', arg[0], arg[1], 'by', os.getpid())
time.sleep(arg[1])
print('Finished task', arg[0], arg[1], 'by', os.getpid())
return (arg[0], arg[1] * 2)
def Main():
import multiprocessing as mp
def GenTasks(n):
import random, os, time
for i in range(n):
t = round(random.random() * 2., 3)
print('Created task', i, t, 'by', os.getpid())
yield (i, t)
time.sleep(random.random())
num_tasks = 4
for strict_order in [True, False]:
print('\nIs strict order', strict_order)
with mp.Pool() as pool:
for res in (pool.imap_unordered, pool.imap)[strict_order](
ProcessTask, GenTasks(num_tasks)
):
print('Result from task', res)
if __name__ == '__main__':
Main()
Outputs:
Is strict order True
Created task 0 0.394 by 10536
Created task 1 0.357 by 10536
Started task 0 0.394 by 8740
Started task 1 0.357 by 5344
Finished task 1 0.357 by 5344
Finished task 0 0.394 by 8740
Result from task (0, 0.788)
Result from task (1, 0.714)
Created task 2 0.208 by 10536
Started task 2 0.208 by 5344
Finished task 2 0.208 by 5344
Result from task (2, 0.416)
Created task 3 0.937 by 10536
Started task 3 0.937 by 8740
Finished task 3 0.937 by 8740
Result from task (3, 1.874)
Is strict order False
Created task 0 1.078 by 10536
Started task 0 1.078 by 7256
Created task 1 0.029 by 10536
Started task 1 0.029 by 5440
Finished task 1 0.029 by 5440
Result from task (1, 0.058)
Finished task 0 1.078 by 7256
Result from task (0, 2.156)
Created task 2 1.742 by 10536
Started task 2 1.742 by 5440
Created task 3 0.158 by 10536
Started task 3 0.158 by 7256
Finished task 3 0.158 by 7256
Result from task (3, 0.316)
Finished task 2 1.742 by 5440
Result from task (2, 3.484)
PS:
multiprocessing
in general same single module-script is used by both main and child processes, main and children all start by executing whole script. if __name__ == '__main__':
block is run only by main process, the rest of module's code is executed both by main and children.Main()
in my case) and by children into another function (ProcessTask()
in my case), and some other functions and variable into global scope that is shared and run by both main and children (I don't have anything shared in code).ProcessTask()
in my code) should be in global scope of module.multiprocessing
is available here.One way of achieving communications between two processes is by using a Queue
instance. In the following example, instead of creating two individual processes I have opted instead to create a process pool of two processes:
from multiprocessing import Pool, Manager
import os
def p(q):
pid = os.getpid()
q.put(pid)
for i in range(5):
q.put(i)
q.put(None) # signify "end of file"
def main():
manager = Manager()
q1 = manager.Queue()
q2 = manager.Queue()
with Pool(2) as pool: # create a pool of 2 processes
pool.apply_async(p, args=(q1,))
pool.apply_async(p, args=(q2,))
q1_eof = False
q2_eof = False
while not q1_eof or not q2_eof:
if not q1_eof:
obj = q1.get() # blocking get
if obj is None:
q1_eof = True
else:
print(obj)
if not q2_eof:
obj = q2.get() # blocking get
if obj is None:
q2_eof = True
else:
print(obj)
if __name__ == '__main__':
main()
Prints:
5588
24104
0
0
1
1
2
2
3
3
4
4
The code that uses explicit Process
instances rather than creating a pool follows (I don't tend to subclass the Process
class as that requires more coding:)
from multiprocessing import Process, Queue
import os
def p(q):
pid = os.getpid()
q.put(pid)
for i in range(5):
q.put(i)
q.put(None) # signify "end of file"
def main():
q1 = Queue()
q2 = Queue()
p1 = Process(target=p, args=(q1,))
p1.start()
p2 = Process(target=p, args=(q2,))
p2.start()
q1_eof = False
q2_eof = False
while not q1_eof or not q2_eof:
if not q1_eof:
obj = q1.get() # blocking get
if obj is None:
q1_eof = True
else:
print(obj)
if not q2_eof:
obj = q2.get() # blocking get
if obj is None:
q2_eof = True
else:
print(obj)
p1.join()
p2.join()
if __name__ == '__main__':
main()
Important Note
The two coding examples (one that uses a process pool and one that doesn't) uses two different types of Queue instances.
See Python multiprocessing.Queue vs multiprocessing.manager().Queue()
You can always use multiprocessing.manager().Queue()
in all cases (I generally do) but at a possible loss of some efficiency.