Why is multiprocessing running things in the same process?

旧城冷巷雨未停 提交于 2019-12-22 09:49:44

问题


I run the following solution from How can I recover the return value of a function passed to multiprocessing.Process?:

import multiprocessing
from os import getpid

def worker(procnum):
    print('I am number %d in process %d' % (procnum, getpid()))
    return getpid()

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes = 3)
    print(pool.map(worker, range(5)))

which is supposed to output something like:

I am number 0 in process 19139
I am number 1 in process 19138
I am number 2 in process 19140
I am number 3 in process 19139
I am number 4 in process 19140
[19139, 19138, 19140, 19139, 19140]

but instead I only get

[4212, 4212, 4212, 4212, 4212]

If I feed pool.map a range of 1,000,000 using more than 10 processes I see at most two different pids.

Why is my copy of multiprocessing seemingly running everything in the same process?


回答1:


TL;DR: tasks are not specifically distributed in any way, perhaps your tasks are so short they are all completed before the other processes get started.

From looking at the source of multiprocessing, it appears that tasks are simply put in a Queue, which the worker processes read from (function worker reads from Pool._inqueue). There's no calculated distribution going on, the workers just race to work as hard as possible.

The most likely bet then, would be that as the tasks are simply very short, so one process finishes all of them before the others have a chance to look or even get started. You can easily check if this is the case this by adding a two-second sleep to the task.

I'll note that on my machine, the tasks all get spread over the processes pretty homogeneously (also for #processes > #cores). So there seems to be some system-dependence, even though all processes should have .start()ed before work is queued.


Here's some trimmed source from worker, which shows that the tasks are just read from the queue by each process, so in pseudo-random order:

def worker(inqueue, outqueue, ...):
    ...
    get = inqueue.get
    ...
    while maxtasks is None or (maxtasks and completed < maxtasks):
        try:
            task = get()
        ...

SimpleQueue communicates between processes using a Pipe, from the SimpleQueue constructor:

self._reader, self._writer = Pipe(duplex=False)

EDIT: possibly the part about processes starting too slow is false, so I removed it. All processes are .start()ed before any work is queued (which may be platform-dependent). I can't find whether the process is ready at the moment .start() returns.



来源:https://stackoverflow.com/questions/40432878/why-is-multiprocessing-running-things-in-the-same-process

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!