问题
I have a Python 3.7 script on a Linux machine where I am trying to run a function in multiple threads, but when I try I receive the following error:
Traceback (most recent call last):
File "./test2.py", line 43, in <module>
pt.ping_scanx()
File "./test2.py", line 39, in ping_scanx
par = Parallel(function=self.pingx, parameter_list=list, thread_limit=10)
File "./test2.py", line 19, in __init__
self._x = self._pool.starmap(function, parameter_list, chunksize=1)
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
put(task)
File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object
This is the sample code that I am using to demonstrate the issue:
#!/usr/local/bin/python3.7
from multiprocessing import Pool
import pexpect # Used to run SSH for sessions
class Parallel:
def __init__(self, function, parameter_list, thread_limit=4):
# Create new thread to hold our jobs
self._pool = Pool(processes=thread_limit)
self._x = self._pool.starmap(function, parameter_list, chunksize=1)
class PingTest():
def __init__(self):
self._pex = None
def connect(self):
self._pex = pexpect.spawn("ssh snorton@127.0.0.1")
def pingx(self, target_ip, source_ip):
print("PING {} {}".format(target_ip, source_ip))
def ping_scanx(self):
self.connect()
list = [['8.8.8.8', '96.53.16.93'],
['8.8.8.8', '96.53.16.93']]
par = Parallel(function=self.pingx, parameter_list=list, thread_limit=10)
pt = PingTest()
pt.ping_scanx()
If I don't include the line with pexpect.spawn, the error doesn't happen. Can someone explain why I am getting the error, and suggest a way to fix it?
回答1:
With multiprocessing.Pool
you're actually calling the function as separate processes, not threads. Processes cannot share Python objects unless they are serialized first before transmitting them to each other via inter-processing communication channels, which is what multiprocessing.Pool
does for you behind the scene using pickle
as the serializer. Since pexpect.spawn
opens a terminal device as a file-like TextIOWrapper
object, and you're storing the returning object in the PingTest
instance and then passing the bound method self.pingx
to Pool.starmap
, it will try to serialize self
, which contains the pexpect.spawn
object in the _pex
attribute that unfortunately cannot be serialized because TextIOWrapper
does not support serialization.
Since your function is I/O-bound, you should use threading instead via the multiprocessing.dummy module for more efficient parallelization and more importantly in this case, to allow the pexpect.spawn
object to be shared across the threads, with no need for serialization.
Change:
from multiprocessing import Pool
to:
from multiprocessing.dummy import Pool
Demo: https://repl.it/@blhsing/WiseYoungExperiments
来源:https://stackoverflow.com/questions/58772451/error-using-pexpect-and-multiprocessing-error-typerror-cannot-serialize-io