问题
I am trying to implement an MPI_intercomm
in Fortran between 2 inter communicators, one which has first 2 process and the other having the rest.
I need to perform send, recv operations between the newly created communicators.
The code:
program hello
include 'mpif.h'
integer tag,ierr,rank,numtasks,color,new_comm,inter1,inter2
tag = 22
call MPI_Init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,numtasks,ierr)
if (rank < 2) then
color = 0
else
color = 1
end if
call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,rank,new_comm,ierr)
if (color .eq. 0) then
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,1,tag,inter1,ierr)
!local_comm,local leader,peer_comm,remote leader,tag,new,ierr
else if(color .eq. 1) then
call MPI_INTERCOMM_CREATE(new_comm,1,MPI_COMM_WORLD,0,tag,inter2,ierr)
end if
select case (color)
case (0)
call MPI_COMM_FREE(inter1)
case(1)
call mpi_comm_free(inter2)
end select
call MPI_finalize(ierr)
end
The code compiles without any issues. But gets stuck while running and sometimes shows error.
回答1:
Short answer: the problem comes from the specification of the remote_leader
.
Long answer:
I am assuming that your splitting logic is what you want: process 0 and 1 in color 0
and the rest of the world in color 1
, and also that you will always have more than 3 processes.
You have to choose:
the
local_leader
for each color. This is the rank in the local communicator (new_comm
in your case) of the leader of each group. The headache free approach is to choose the process of rank 0, because this is the rank in the local communicator, all process can have the exact same value. So I am choosing rank 0.the
remote_leader
for each color; this must be the rank in thepeer_comm
(MPI_Comm_world
in your case) of the leader of the other end of the inter-communicator. It means that, process in color 0 have to know what process0
in color1
correspond to inMPI_Comm_world
; and process in colors 1 have to know what process0
in color0
correspond to inMPI_Comm_world
. According to your splitting logic and my logic of choosing the local leader,remote_leader
must be process2
for the color 0, and process0
for color1
.
And you should be good to go with this modified lines of code:
if (color .eq. 0) then
if(rank==0) print*, ' 0 here'
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,2,tag,inter1,ierr)
else if(color .eq. 1) then
if(rank==2) print*, ' 2 here'
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_COMM_WORLD,0,tag,inter2,ierr)
end if
The most important difference with your code is that remote_leader
is 2
for color 0
. That is the source of the problem.
The secondary difference is that local_leader
is 0
for color 1. This correspond to my logic of choosing local_leader
. It is not the source of the problem, however, it can be if you have only 1
process in color 1
.
Update
Thanks to Hristo Iliev, I am adding this update. If your goal was to use process 1 of color 1 as local_leader
, then the remote_leader
for color 0
should be 3
and the code will be:
if (color .eq. 0) then
if(rank==0) print*, ' 0 here'
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,3,tag,inter1,ierr)
else if(color .eq. 1) then
if(rank==2) print*, ' 2 here'
call MPI_INTERCOMM_CREATE(new_comm,1,MPI_COMM_WORLD,0,tag,inter2,ierr)
end if
Make sure you check everything for this option as I did not pay special attention to check it. Also make sure that you always have more that 1
process in color 1
.
来源:https://stackoverflow.com/questions/38356001/unable-to-implement-mpi-intercomm-create