问题
I am using Open MPI (1.8.3) on Cygwin on a Windows 7 machine. I would like to run MPI codes on this machine exclusively, without talking on any external network. I understand I should be able to restrict mpirun
to self and shared memory communication using MCA options like so:
mpirun -n 8 --mca btl sm,self ./hello.exe
However, when I try this, Windows asks me if I'd like to make a firewall exception, indicating my job is trying to talk externally over TCP. Additionally, mpirun will hang for roughly one minute before completing if and only if I'm on a wireless network before the hello world job completes. If I turn off my wireless card or switch to a wired ethernet connection, it completes in around one second as expected.
Why is mpirun
not observing my choice of BTL?
回答1:
Why is
mpirun
not observing my choice of BTL?
It is definitely observing your choice of BTL. But there is another framework, namely OOB, which is also using TCP by default. You should disable the tcp
component for both frameworks in order to prevent Open MPI from using TCP altogether:
mpirun -n 8 --mca btl ^tcp --mca oob ^tcp ...
Note that completely disabling TCP might have unexpected effects.
回答2:
For completeness, I'd like to elaborate on Hristo's answer.
If was suffering from seemingly random crashes of my simulation software. After some detective work, I found out, that dropping network connections can cause MPI to abort/fail/crash.
The cause of the random crashes was a fragile wireless to which my laptop was connected. Thus, when the occasional drop of the wifi connection caused my purely local job to end.
Thus, on my system I excluded the wifi network (named wlp3s0
on my Ubuntu machine from both BTL and OOB. Now, the parallel run survives disabling the wifi.
mpirun --mca oob_tcp_if_exclude wlp3s0 --mca btl_tcp_if_exclude wlp3s0 -np 2 someApplication
来源:https://stackoverflow.com/questions/26350173/why-does-mpirun-not-respect-my-choice-of-btl