OSError: [Errno 12] Cannot allocate memory when using python multiprocessing Pool

吃可爱长大的小学妹 提交于 2020-05-13 18:51:27

问题


I am trying to apply a function to 5 cross validation sets in parallel using Python's multiprocessing and repeat that for different parameter values, like so:

import pandas as pd
import numpy as np
import multiprocessing as mp
from sklearn.model_selection import StratifiedKFold

#simulated datasets
X = pd.DataFrame(np.random.randint(2, size=(3348,868), dtype='int8'))
y = pd.Series(np.random.randint(2, size=3348, dtype='int64'))

#dummy function to apply
def _work(args):
    del(args)

for C in np.arange(0.0,2.0e-3,1.0e-6):
    splitter = StratifiedKFold(n_splits=5)
    with mp.Pool(processes=5) as pool:
        pool_results = \
            pool.map(
                func=_work,
                iterable=((C,X.iloc[train_index],X.iloc[test_index]) for train_index, test_index in splitter.split(X, y))
            )

However halfway through execution I get the following error:

Traceback (most recent call last):
  File "mre.py", line 19, in <module>
    with mp.Pool(processes=5) as pool:
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

I'm running this on Ubuntu 16.04 with 32Gb of memory, and checking htop during execution it never goes over 18.5Gb, so I don't think I'm running out of memory.
It is definitly due to the splitting of my dataframes with the indexes from splitter.split(X,y) since when I directly pass my dataframes to the Pool object no error is thrown.

I saw this answer that says it might be due to too many file dependencies being created, but I have no idea how I might go about fixing that, and isn't the context manager supposed to help avoid this sort of problem?


回答1:


os.fork() makes a copy of a process, so if you're sitting at about 18 GB of usage, and want to call fork, you need another 18 GB. Twice 18 is 36 GB, which is well over 32 GB. While this analysis is (intentionally) naive—some things don't get copied on fork—it's probably sufficient to explain the problem.

The solution is either to make the pools earlier, when less memory needs to be copied, or to work harder at sharing the largest objects. Or, of course, add more memory (perhaps just virtual memory, i.e., swap space) to the system.



来源:https://stackoverflow.com/questions/54364064/oserror-errno-12-cannot-allocate-memory-when-using-python-multiprocessing-poo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!