问题
I was using this answer in order to run parallel commands with multiprocessing in Python on a Linux box.
My code did something like:
import multiprocessing
import logging
def cycle(offset):
# Do stuff
def run():
for nprocess in process_per_cycle:
logger.info("Start cycle with %d processes", nprocess)
offsets = list(range(nprocess))
pool = multiprocessing.Pool(nprocess)
pool.map(cycle, offsets)
But I was getting this error: OSError: [Errno 24] Too many open files
So, the code was opening too many file descriptor, i.e.: it was starting too many processes and not terminating them.
I fixed it replacing the last two lines with these lines:
with multiprocessing.Pool(nprocess) as pool:
pool.map(cycle, offsets)
But I do not know exactly why those lines fixed it.
What is happening underneath of that with
?
回答1:
You're creating new processes inside a loop, and then forgetting to close them once you're done with them. As a result, there comes a point where you have too many open processes. This is a bad idea.
You could fix this by using a context manager which automatically calls pool.terminate
, or manually call pool.terminate
yourself. Alternatively, why don't you create a pool outside the loop just once, and then send tasks to the processes inside?
pool = multiprocessing.Pool(nprocess) # initialise your pool
for nprocess in process_per_cycle:
...
pool.map(cycle, offsets) # delegate work inside your loop
pool.close() # shut down the pool
For more information, you could peruse the multiprocessing.Pool documentation.
回答2:
It is context manger. Using with ensures that you are opening and closing files properly. To understand this in detail, I'd recommend this article https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/
来源:https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh