问题
>>> import concurrent.futures
>>> from collections import namedtuple
>>> #1. Initialise namedtuple here
>>> # tm = namedtuple("tm", ["pk"])
>>> class T:
... #2. Initialise named tuple here
... #tm = namedtuple("tm", ["pk"])
... def __init__(self):
... #3: Initialise named tuple here
... tm = namedtuple("tm", ["pk"])
... self.x = {'key': [tm('value')]}
... def test1(self):
... with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
... results = executor.map(self.test, ["key"])
... return results
... def test(self, s):
... print(self.x[s])
...
>>> t = T().test1()
This gets stuck here.
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
Process ForkProcess-1:
File "<stdin>", line 10, in test1
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 623, in __exit__
self.shutdown(wait=True)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/process.py", line 681, in shutdown
self._queue_management_thread.join()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1044, in join
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/process.py", line 233, in _process_worker
call_item = call_queue.get(block=True)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
self._wait_for_tstate_lock()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1060, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
If I initialise the named tuple outside of the class (in #1), in that case, this works fine. Could someone please let me know what is the issue if I initialise as per #2 or #3 ?
回答1:
You're not changing where you initialize the namedtuple. You're changing where you create the namedtuple class.
When you create a namedtuple class named "x" in module "y" with collections.namedtuple
, its __module__
is set to 'y'
and its __qualname__
is set to 'x'
. Pickling and unpickling relies on this class actually being available in the y.x
location indicated by these attributes, but in cases 2 and 3 of your example, it's not.
Python can't pickle the namedtuple, which breaks inter-process communication with the workers. Executing self.test
in a worker process relies on pickling self.test
and unpickling a copy of it in the worker process, and that can't happen if self.x
is an instance of a class that can't be pickled.
来源:https://stackoverflow.com/questions/63609985/using-collections-namedtuple-with-processpoolexecutor-gets-stuck-in-a-few-cases