Python multiprocessing.Pool.map dying silently

半腔热情 提交于 2021-02-08 08:42:31

问题


I have tried to put a for loop in parallel to speed up some code. consider this:

from multiprocessing import Pool

results = []

def do_stuff(str):
    print str
    results.append(str)

p = Pool(4)
p.map(do_stuff, ['str1','str2','str3',...]) # many strings here ~ 2000
p.close()

print results

I have some debug messages showing from do_stuff to keep track of how far the program gets before dying. It seems to die at different points each time through. For example it will print 'str297' and then it will just stop running, I will see all the CPUs stop working and the program just sits there. Should be some error occuring but there is no error message showing. Does anyone know how to debug this problem?

UPDATE

I tried re-working the code a little bit. Instead of using the map function I tried the apply_async function like this:

        pool = Pool(5)
        results = pool.map(do_sym, underlyings[0::10])
        results = []
        for sym in underlyings[0::10]:
            r = pool.apply_async(do_sym, [sym])
            results.append(r)

        pool.close()
        pool.join()

        for result in results:
            print result.get(timeout=1000)

This worked just as good as the map function, but ended up hanging in the same way. It would never get to the for loop where it prints the results.

After working on this a little more, and trying some debugging logging like was suggested in unutbu's answer, I will give some more info here. The problem is very strange. It seems like the pool is just hanging there and unable to close and continue the program. I use the PyDev environment for testing my programs, but I thought I would try just running python in the console. In the console I get the same behavior, but when I press control+C to kill the program, I get some output which might explain where the problem is:

> KeyboardInterrupt ^CProcess PoolWorker-47: Traceback (most recent call
> last):   File "/usr/lib/python2.7/multiprocessing/process.py", line
> 258, in _bootstrap Process PoolWorker-48: Traceback (most recent call
> last):   File "/usr/lib/python2.7/multiprocessing/process.py", line
> 258, in _bootstrap Process PoolWorker-45: Process PoolWorker-46:
> Process PoolWorker-44:
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
> Traceback (most recent call last): Traceback (most recent call last):
> Traceback (most recent call last):   File
> "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
> _bootstrap   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap   File
> "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
> _bootstrap
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     racquire()
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
> KeyboardInterrupt
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     self.run()
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self.run()   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run   File
> "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
>     self._target(*self._args, **self._kwargs)
>     self._target(*self._args, **self._kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
>     racquire()   File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker KeyboardInterrupt
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     task = get()
>     task = get()   File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get  
> File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
>     racquire()
>     return recv()
>     racquire() KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt

Then actually the program never dies. I end up having to close the terminal window to kill it.

UPDATE 2

I narrowed down the problem inside the function that is running in the pool, and it was a MySQL database transaction that was causing the problem. I was using the MySQLdb package before. I switched it the a pandas.read_sql function for the transaction, and it is working now.


回答1:


pool.map returns the results in a list. So instead of calling results.append in the concurrent processes (which will not work since each process will have its own independent copy of results), assign results to the value returned by pool.map in the main process:

import multiprocessing as mp

def do_stuff(text):
    return text

if __name__ == '__main__':
    p = mp.Pool(4)
    tasks = ['str{}'.format(i) for i in range(2000)]
    results = p.map(do_stuff, tasks)
    p.close()

    print(results)

yields

['str0', 'str1', 'str2', 'str3', ...]

One method of debugging scripts that use multiprocessing is to add logging statements. The multiprocessing module provides a helper function, mp.log_to_stderr, for this purpose. For example,

import multiprocessing as mp
import logging

logger = mp.log_to_stderr(logging.DEBUG)

def do_stuff(text):
    logger.info('Received {}'.format(text))
    return text

if __name__ == '__main__':
    p = mp.Pool(4)
    tasks = ['str{}'.format(i) for i in range(2000)]
    results = p.map(do_stuff, tasks)
    p.close()

    logger.info(results)

which yields logging output like:

[DEBUG/MainProcess] created semlock with handle 139824443588608
[DEBUG/MainProcess] created semlock with handle 139824443584512
[DEBUG/MainProcess] created semlock with handle 139824443580416
[DEBUG/MainProcess] created semlock with handle 139824443576320
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-2] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-3] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-4] child process calling self.run()
[INFO/PoolWorker-1] Received str0
[INFO/PoolWorker-2] Received str125
[INFO/PoolWorker-3] Received str250
[INFO/PoolWorker-4] Received str375
[INFO/PoolWorker-3] Received str251
...
[INFO/PoolWorker-4] Received str1997
[INFO/PoolWorker-4] Received str1998
[INFO/PoolWorker-4] Received str1999
[DEBUG/MainProcess] closing pool
[INFO/MainProcess] ['str0', 'str1', 'str2', 'str3', ...]
[DEBUG/MainProcess] worker handler exiting
[DEBUG/MainProcess] task handler got sentinel
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] task handler sending sentinel to result handler
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] finalizing pool
[DEBUG/MainProcess] task handler sending sentinel to workers
[DEBUG/MainProcess] helping task handler/workers to finish
[DEBUG/MainProcess] result handler got sentinel
[DEBUG/PoolWorker-3] worker got sentinel -- exiting
[DEBUG/MainProcess] removing tasks from inqueue until task handler finished
[DEBUG/MainProcess] ensuring that outqueue is not full
[DEBUG/MainProcess] task handler exiting
[DEBUG/PoolWorker-3] worker exiting after 2 tasks
[INFO/PoolWorker-3] process shutting down
[DEBUG/MainProcess] result handler exiting: len(cache)=0, thread._state=0
[DEBUG/PoolWorker-3] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] joining worker handler
[DEBUG/MainProcess] terminating workers
[DEBUG/PoolWorker-3] running the remaining "atexit" finalizers
[DEBUG/MainProcess] joining task handler
[DEBUG/MainProcess] joining result handler
[DEBUG/MainProcess] joining pool workers
[DEBUG/MainProcess] cleaning up worker 4811
[DEBUG/MainProcess] running the remaining "atexit" finalizers

Notice that each line indicates which process emitted the logging record. So the output to some extent serializes the order of events from amongst your concurrent processes.

By judicious placement of logging.info calls you should be able to narrow down where and maybe why your script is "dying silently" (or, at least it won't be quite so silent as it dies).



来源:https://stackoverflow.com/questions/33303020/python-multiprocessing-pool-map-dying-silently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!