问题
I am currently starting to develop a parallel code for scientific applications. I have to exchange some buffers from p0 to p1 and from p1 to p0 (I am creating ghost point between processors boundaries).
The error can be summarized by this sample code:
program test
use mpi
implicit none
integer id, ids, idr, ierr, tag, istat(MPI_STATUS_SIZE)
real sbuf, rbuf
call mpi_init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,id,ierr)
if(id.eq.0) then
ids=0
idr=1
sbuf=1.5
tag=id
else
ids=1
idr=0
sbuf=3.5
tag=id
endif
call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
call mpi_finalize(ierr)
return
end
What is wrong with this?
回答1:
Coding with MPI can be difficult at first, and it's good that you're going through the steps of making a sample code. Your sample code as posted hangs due to deadlock. Both processes are busy MPI_SEND
-ing, and the send cannot complete until it has been MPI_RECV
-ed. So the code is stuck.
There are two common ways around this problem.
Send and Receive in a Particular Order
This is the simple and easy-to-understand solution. Code your send and receive operations such that nobody ever gets stuck. For your 2-process test case, you could do:
if (id==0) then
call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
else
call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
endif
Now, process 1 receives first, so there is never a deadlock. This particular example is not extensible, but there are various looping structures that can help. You can imagine a routine to send data from every process to every other process as:
do sending_process=1,nproc
if (id == sending_process) then
! -- I am sending
do destination_process = 1,nproc
if (sending_process == destination_process) cycle
call MPI_SEND ! Send to destination_process
enddo
elseif
! -- I am receiving
call MPI_RECV ! Receive from sending_process
endif
enddo
This works reasonably well and is easy to follow. I recommend this structure for beginners.
However, it has several issues for truly large problems. You are sending a number of messages equal to the number of processes squared, which can overload a large network. Also, depending on your operation, you probably do not need to send data from every process to every other process. (I suspect this is true for you given you mentioned ghosts.) You can modify the above loop to only send if data are required, but for those cases there is a better option.
Use Non-Blocking MPI Operations
For many-core problems, this is often the best solution. I recommend sticking to the simple MPI_ISEND
and MPI_IRECV
. Here, you start all necessary sends and receives, and then wait.
Here, I am using some list structure which has been setup already which defines the complete list of necessary destinations for each process.
! -- Open sends
do d=1,Number_Destinations
idest = Destination_List(d)
call MPI_ISEND ! To destination d
enddo
! -- Open receives
do s=1,Number_Senders
isend = Senders_List(s)
call MPI_IRECV ! From source s
enddo
call MPI_WAITALL
This option may look simpler but it is not. You must set up all necessary lists beforehand, and there are a variety of potential problems with buffer size and data alignment. Even still, it is typically the best answer for big codes.
回答2:
As pointed by Vladimir, your code is too incomplete to provide a definitive answer.
That being said, that could be a well known error.
MPI_Send()
might block. From a pragmatic point of view, MPI_Send()
is likely to return immediately when sending a short message, but is likely to block when sending a large message. Note small and large depends on your MPI library, the interconnect you are using plus other runtime parameters. MPI_Send()
might block until a MPI_Recv()
is posted on the other end.
It seems you MPI_Send()
and MPI_Recv()
in the same block of code, so you can try using MPI_Sendrecv()
to do it in one shot. MPI_Sendrecv()
will issue a non blocking send under the hood, so that will help if your issue is really a MPI_Send()
deadlock.
来源:https://stackoverflow.com/questions/47247997/mpi-send-receive-issue-in-fortran