Error using pexpect and multiprocessing? error “TypError: cannot serialize '_io.TextIOWrapper' object”

人盡茶涼 提交于 2021-02-10 20:00:14

问题


I have a Python 3.7 script on a Linux machine where I am trying to run a function in multiple threads, but when I try I receive the following error:

Traceback (most recent call last):
  File "./test2.py", line 43, in <module>
    pt.ping_scanx()
  File "./test2.py", line 39, in ping_scanx
    par = Parallel(function=self.pingx, parameter_list=list, thread_limit=10)
  File "./test2.py", line 19, in __init__
    self._x = self._pool.starmap(function, parameter_list, chunksize=1)
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/local/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object

This is the sample code that I am using to demonstrate the issue:

#!/usr/local/bin/python3.7
from multiprocessing import Pool
import pexpect   # Used to run SSH for sessions

class Parallel:

    def __init__(self, function, parameter_list, thread_limit=4):

        # Create new thread to hold our jobs
        self._pool = Pool(processes=thread_limit)

        self._x = self._pool.starmap(function, parameter_list, chunksize=1)

class PingTest():

    def __init__(self):
        self._pex = None

    def connect(self):
        self._pex = pexpect.spawn("ssh snorton@127.0.0.1")

    def pingx(self, target_ip, source_ip):
        print("PING {} {}".format(target_ip, source_ip))

    def ping_scanx(self):

        self.connect()

        list = [['8.8.8.8', '96.53.16.93'],
                ['8.8.8.8', '96.53.16.93']]

        par = Parallel(function=self.pingx, parameter_list=list, thread_limit=10)


pt = PingTest()
pt.ping_scanx()

If I don't include the line with pexpect.spawn, the error doesn't happen. Can someone explain why I am getting the error, and suggest a way to fix it?


回答1:


With multiprocessing.Pool you're actually calling the function as separate processes, not threads. Processes cannot share Python objects unless they are serialized first before transmitting them to each other via inter-processing communication channels, which is what multiprocessing.Pool does for you behind the scene using pickle as the serializer. Since pexpect.spawn opens a terminal device as a file-like TextIOWrapper object, and you're storing the returning object in the PingTest instance and then passing the bound method self.pingx to Pool.starmap, it will try to serialize self, which contains the pexpect.spawn object in the _pex attribute that unfortunately cannot be serialized because TextIOWrapper does not support serialization.

Since your function is I/O-bound, you should use threading instead via the multiprocessing.dummy module for more efficient parallelization and more importantly in this case, to allow the pexpect.spawn object to be shared across the threads, with no need for serialization.

Change:

from multiprocessing import Pool

to:

from multiprocessing.dummy import Pool

Demo: https://repl.it/@blhsing/WiseYoungExperiments



来源:https://stackoverflow.com/questions/58772451/error-using-pexpect-and-multiprocessing-error-typerror-cannot-serialize-io

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!