Why does mpirun not respect my choice of BTL?

百般思念 提交于 2019-12-11 02:10:05

问题


I am using Open MPI (1.8.3) on Cygwin on a Windows 7 machine. I would like to run MPI codes on this machine exclusively, without talking on any external network. I understand I should be able to restrict mpirun to self and shared memory communication using MCA options like so:

mpirun -n 8 --mca btl sm,self ./hello.exe

However, when I try this, Windows asks me if I'd like to make a firewall exception, indicating my job is trying to talk externally over TCP. Additionally, mpirun will hang for roughly one minute before completing if and only if I'm on a wireless network before the hello world job completes. If I turn off my wireless card or switch to a wired ethernet connection, it completes in around one second as expected.

Why is mpirun not observing my choice of BTL?


回答1:


Why is mpirun not observing my choice of BTL?

It is definitely observing your choice of BTL. But there is another framework, namely OOB, which is also using TCP by default. You should disable the tcp component for both frameworks in order to prevent Open MPI from using TCP altogether:

mpirun -n 8 --mca btl ^tcp --mca oob ^tcp ...

Note that completely disabling TCP might have unexpected effects.




回答2:


For completeness, I'd like to elaborate on Hristo's answer.

If was suffering from seemingly random crashes of my simulation software. After some detective work, I found out, that dropping network connections can cause MPI to abort/fail/crash.

The cause of the random crashes was a fragile wireless to which my laptop was connected. Thus, when the occasional drop of the wifi connection caused my purely local job to end.

Thus, on my system I excluded the wifi network (named wlp3s0 on my Ubuntu machine from both BTL and OOB. Now, the parallel run survives disabling the wifi.

mpirun --mca oob_tcp_if_exclude wlp3s0 --mca btl_tcp_if_exclude wlp3s0 -np 2 someApplication


来源:https://stackoverflow.com/questions/26350173/why-does-mpirun-not-respect-my-choice-of-btl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!