Python Multiprocessing: Broken Pipe exception after increasing Pool size

问题

The exception I get. All I did that I increased pool count

Code

 def parse(url):
  r = request.get(url)
POOL_COUNT = 75
with Pool(POOL_COUNT) as p:
    result = p.map(parse, links)



File "/usr/lib64/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib64/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib64/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib64/python3.5/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib64/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-26:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib64/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib64/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib64/python3.5/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib64/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

回答1:

I was seeing Broken Pipe exception too. But mine is more complicated.

One reason that increasing the pool size alone will lead to exception would be you're getting too many things in request module so it could leads to not enough memory. Then it will seg-fault especially you have a small swap.

Edit1: I believe it's caused by memory usage. Too many pool connections used up too many memory and it finally get broken. It's very hard to debug and I myself limited my pool size to 4 since I have a small RAM and big packages.

回答2:

This simple version of you code works perfect here with any number of POOL_COUNT

from multiprocessing import Pool
def parse(url):
  r = url
  print(r)

POOL_COUNT = 90
with Pool(processes=POOL_COUNT) as p:
    links = [str(i) for i in range(POOL_COUNT)]
    result = p.map(parse, links)

Doesn't it? So the problem should be in request part, maybe needs a sleep?

回答3:

I tried to reproduce on a AWS t2.small instance (2GB RAM as you described) with the following script (note that you missed a s in requests.get(), assuming you are using the requests library, and also the return was missing):

from multiprocessing import Pool
import requests
def parse(url):
  a = requests.get(url)
  if a.status_code != 200:
    print(a)
  return a.text
POOL_COUNT = 120
links = ['http://example.org/' for i in range(1000)]
with Pool(POOL_COUNT) as p:
  result = p.map(parse, links)
print(result)

Sadly, I didn't run into the same issue as you did.

From the stack trace you posted it seems that the problem is in launching the parse function, not in the requests module itself. It looks like the main process cannot send data to one of the launched processes.

Anyway: This operation is not CPU bound, the bottleneck is the network (most probably the remote servers max connections, or also probably), you are much better off using multithreading. This is most probably also faster, because multiprocessing.map needs to communicate between the processes, that means that the return of parse needs to be pickled and then sent to the main process.

To try with threads instead of processes, simply do from multiprocessing.pool import ThreadPool and replace Pool with ThreadPool in your code.

来源：https://stackoverflow.com/questions/45230593/python-multiprocessing-broken-pipe-exception-after-increasing-pool-size

标签

python

python-multiprocessing