问题
I have recently encountered a weir behavior. If I run the following code on my machine (using the most recent version of cygwin, Open MPI version 1.8.6) I get a linearly growing memory usage that quickly overwhelms my pc.
program memoryTest
use mpi
implicit none
integer :: ierror,errorStatus ! error codes
integer :: my_rank ! rank of process
integer :: p ! number of processes
integer :: i,a,b
call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, p, ierror)
b=0
do i=1,10000000
a=1*my_rank
call MPI_REDUCE(a,b,1,MPI_INTEGER,MPI_MAX,0,MPI_COMM_WORLD,errorStatus)
end do
call MPI_Finalize(ierror)
stop
end program memoryTest
Any idea what the problem might be? The code looks fine to my beginner's eyes. The compilation line is
mpif90 -O2 -o memoryTest.exe memoryTest.f90
回答1:
This has been discussed in a related thread here.
The problem is that the root process needs to receive data from other processes and perform the reduction while other processes only need to send the data to the root process. So the root process is running slower and it could be overwhelmed by the number of incoming messages. If you insert at MPI_BARRIER call after the MPI_REDUCE call then the code should run without a problem.
The relevant part of the MPI specification says: "Collective operations can (but are not required to) complete as soon as the caller's participation in the collective communication is finished. A blocking operation is complete as soon as the call returns. A nonblocking (immediate) call requires a separate completion call (cf. Section 3.7 ). The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer. It does not indicate that other processes in the group have completed or even started the operation (unless otherwise implied by the description of the operation). Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This statement excludes, of course, the barrier operation."
回答2:
To add a bit more support for macelee's answer: if you run this program to completion under MPICH with MPICH's internal memory leak tracing/reporting turned on, you see no leaks. Furthermore, valgrind's leak-check reports
==12866== HEAP SUMMARY:
==12866== in use at exit: 0 bytes in 0 blocks
==12866== total heap usage: 20,001,601 allocs, 20,000,496 frees, 3,369,410,210 bytes allocated
==12866==
==12866== All heap blocks were freed -- no leaks are possible
==12866==
来源:https://stackoverflow.com/questions/33754220/mpi-reduce-causing-memory-leak