问题
I have a set of computational operations that need to be performed a cluster (maybe like 512 MPI processes). Right now, I have the root node on the cluster open a socket and transfer data to my local computer in between the compute operations, but I'm wondering if it's possible to just create two MPI groups, and one of those groups is my local machine, and the other the remote cluster, and to send data between them using MPI commands.
Is this possible?
回答1:
Yes, it is possible, as long as there is a network path between the cluster node and your machine. The MPI standard provides the abstract mechanisms to do it, while Open MPI provides a really simple way to make the things work. You have to look into the Process Creation and Management section of the standard (Chapter 10 of MPI-2.2), and specifically into the Establishing Communication subsection (§10.4 of MPI-2.2). Basically the steps are:
- You start both MPI jobs separately. This is obviously what you do, so nothing new here.
- One of the jobs creates a network port using
MPI_Open_port()
. This MPI call returns a unique port name that then has to be published as a well-known service name usingMPI_Publish_name()
. Once the port is opened, it can be used to accept client connections by calling the blocking routineMPI_Comm_accept()
. The job has now become the server job. - The other MPI job, referred to as the client job, first resolves the port name from the service name using
MPI_Lookup_name()
. Once it has the port name, it can callMPI_Comm_connect()
in order to connect to the remote server. - Once
MPI_Comm_connect()
is paired with the respectiveMPI_Comm_accept()
, both jobs will establish an intercommunicator between them and messages could then be sent back and forth.
One intricate detail is how the client job could look up the port name given the service name? This is a less documented part of Open MPI, but it is quite easy: you have to provide the mpiexec
command that you use to start the client job with the URI of the mpiexec
of the server job, which acts as a sort of directory service. To do that, you should launch the server job with the --report-uri -
argument to make it print its URI to the standard output:
$ mpiexec --report-uri - <other arguments like -np> ./server ...
It will give you a long URI of the form 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
. Now you have to supply this URI to the client mpiexec
with the --ompi-server uri
option:
$ mpiexec --ompi-server 1221656576.0;tcp://10.1.13.164:36351... ./client ...
Note that the URI contains the addresses of all configured and enabled network interfaces that are present at the node, where the server's mpiexec
is started. You should ensure that the client is able to reach at least one of them. Also ensure that you have the TCP BTL component in the list of enabled BTL components, otherwise no messages could flow. The TCP BTL is usually enabled by default, but on some InfiniBand installations it is explicitly disabled, either by setting the corresponding value of the environment variable OMPI_MCA_btl
or in the default Open MPI MCA configuration file. The MCA parameters can be overridden with --mca
option, for example:
$ mpiexec --mca btl self,sm,openib,tcp --report-uri - ...
Also see the answer that I gave to a similar question.
回答2:
Yes, it should just work out of the box if there is a TCP/IP connection available (MPI communicates at a random, high TCP port - if TCP is used as transfer layer). Try adding your machine to the hostfile which you supply to mpirun
. If that doesn't work, you can directly connect to your machine using MPI_Open_port which doesn't require mpirun
.
来源:https://stackoverflow.com/questions/15939757/is-it-possible-to-run-openmpi-on-a-local-computer-and-a-remote-cluster