mpi4py: close MPI Spawn?

社会主义新天地 提交于 2019-12-10 10:15:47

问题


I have some python code in which I very often Spawn multiple processes. I get an error:

ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 809

My code roughly looks like this

import mpi4py
comm = MPI.COMM_WORLD
...
icomm = MPI.COMM_SELF.Spawn(sys.executable,args=["front_process.py",str(rank)],maxprocs=no_fronts)
...
message = icomm.recv(source=MPI.ANY_SOURCE,tag=21)
...
icomm.Free()

The Spawn command is called very often and I think that they remain "open" after I am finished despite giving the icomm.Free() command. How do I properly "close" a spawned process?


回答1:


The MPI specification for MPI_COMM_FREE states that "... the object is actually deallocated only if there are no other active references to it." You can disconnect processes by calling MPI_COMM_DISCONNECT on both ends of all intercommunicators that link them. The equivalent mpi4py call is probably icomm.Disconnect().

Still the error that you see probably comes from orterun (symlinked as mpirun and mpiexec) and not from the master rank. orterun is the one who launches all MPI processes (the initial ones and those spawned later) and then redirects their standard output to its own standard output so that you can see the output from each rank. When processes are started on the local host, orterun uses simple fork()/exec() mechanism as part of the odls framework to spawn new ranks and makes use of pipes for detection of successful launch and for IO forwarding. The launch detection pipes are open only for a very short period of time but the IO forwarding pipes remain open as long as the rank is running. If you have many ranks running at the same time, lots of pipes will stay open and hence the error message.

The error message is a bit misleading since there are two cases of "too many descriptors" and Open MPI does not distinguish between them. The first case is when the hard kernel limit is reached but this is usually a huge value. The second case is when the per-process limit on the number of file descriptors is reached. The latter can be controlled with the ulimit command. You should check the value in your case with ulimit -n and eventually increase it. For example:

user@host$ ulimit -n 123456
user@host$ mpiexec -n 1 ... ./spawning_code.py arg1 arg2 ...

Here 123456 is the desired limit on the number of descriptors and it cannot exceed the hard limit that can be obtained with ulimit -nH. If you are running your program from a script (either for convenience or because you submit jobs to some batch queueing system), you should put the ulimit -n line in the script before the call to mpirun/mpiexec.

Also in the text above the words rank and process are used to refer to the same thing.



来源:https://stackoverflow.com/questions/20698712/mpi4py-close-mpi-spawn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!