I\'m facing the following issue. I\'m trying to parallelize a function that updates a file, but I cannot start the Pool()
because of an OSError: [Errno 12
When using a multiprocessing.Pool
, the default way to start the processes is fork
. The issue with fork
is that the entire process is duplicated. (see details here). Thus if your main process is already using a lot of memory, this memory will be duplicated, reaching this MemoryError
. For instance, if your main process use 2GB
of memory and you use 8 subprocesses, you will need 18GB
in RAM.
You should try using a different start method such as 'forkserver'
or 'spawn'
:
from multiprocessing import set_start_method, Pool
set_start_method('forkserver')
# You can then start your Pool without each process
# cloning your entire memory
pool = Pool()
func = partial(parallelUpdateJSON, paramMatch, predictionmatrix)
pool.map(func, data)
These methods avoid duplicating the workspace of your Process
but can be a bit slower to start as you need to reload the modules you are using.
We had this a couple of time. According to my sys admin, there is "a bug" in unix, which will raise the same error if you are out of memory, of if your process reach the max file descriptor limit.
We had a leak of file descriptor, and the error raising was [Errno 12] Cannot allocate memory#012OSError.
So you should look at your script and double check if the problem is not the creation of too many FD instead