MPI_REDUCE causing memory leak

问题

I have recently encountered a weir behavior. If I run the following code on my machine (using the most recent version of cygwin, Open MPI version 1.8.6) I get a linearly growing memory usage that quickly overwhelms my pc.

program memoryTest

use mpi

implicit none

integer            :: ierror,errorStatus      ! error codes
integer            :: my_rank                 ! rank of process
integer            :: p                       ! number of processes
integer            :: i,a,b

call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, p, ierror)

b=0
do i=1,10000000
    a=1*my_rank
    call MPI_REDUCE(a,b,1,MPI_INTEGER,MPI_MAX,0,MPI_COMM_WORLD,errorStatus)
end do

call MPI_Finalize(ierror)

stop
end program memoryTest

Any idea what the problem might be? The code looks fine to my beginner's eyes. The compilation line is

mpif90 -O2 -o memoryTest.exe memoryTest.f90

回答1:

This has been discussed in a related thread here.

The problem is that the root process needs to receive data from other processes and perform the reduction while other processes only need to send the data to the root process. So the root process is running slower and it could be overwhelmed by the number of incoming messages. If you insert at MPI_BARRIER call after the MPI_REDUCE call then the code should run without a problem.

The relevant part of the MPI specification says: "Collective operations can (but are not required to) complete as soon as the caller's participation in the collective communication is finished. A blocking operation is complete as soon as the call returns. A nonblocking (immediate) call requires a separate completion call (cf. Section 3.7 ). The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer. It does not indicate that other processes in the group have completed or even started the operation (unless otherwise implied by the description of the operation). Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This statement excludes, of course, the barrier operation."

回答2:

To add a bit more support for macelee's answer: if you run this program to completion under MPICH with MPICH's internal memory leak tracing/reporting turned on, you see no leaks. Furthermore, valgrind's leak-check reports

==12866== HEAP SUMMARY:
==12866==     in use at exit: 0 bytes in 0 blocks
==12866==   total heap usage: 20,001,601 allocs, 20,000,496 frees, 3,369,410,210 bytes allocated
==12866== 
==12866== All heap blocks were freed -- no leaks are possible
==12866==

来源：https://stackoverflow.com/questions/33754220/mpi-reduce-causing-memory-leak

标签

memory-leaks

fortran

mpi

reduce