Sending large std::vector using MPI_Send and MPI_Recv doesn't complete

╄→尐↘猪︶ㄣ 提交于 2020-01-05 04:17:07

问题


I'm trying to send a std::vector using MPI. This works fine when the the vector is small, but just doesn't work when the vector is large (more than ~15k doubles in the vector). When trying to send a vector with 20k doubles, the program just sits there with the CPU at 100%.

Here is a minimal example

#include <vector>
#include <mpi.h>

using namespace std;

vector<double> send_and_receive(vector<double> &local_data, int n, int numprocs, int my_rank) {
    MPI_Send(&local_data[0], n, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);

    if (my_rank == 0) {
        vector<double> global_data(numprocs*n);
        vector<double> temp(n);
        for (int rank = 0; rank < numprocs; rank++) {
            MPI_Recv(&temp[0], n, MPI_DOUBLE, rank, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            for (int i = 0; i < n; i++) {
                global_data[rank*n + i] = temp[i];
            }
        }
        return global_data;
    }
    return vector<double>();
}

int main(int args, char *argv[]) {
    int my_rank, numprocs;
    // MPI initialization
    MPI_Init (&args, &argv);
    MPI_Comm_rank (MPI_COMM_WORLD, &my_rank);
    MPI_Comm_size (MPI_COMM_WORLD, &numprocs);

    int n = 15000;
    vector<double> local_data(n);

    for (int i = 0; i < n; i++) {
        local_data[i] = n*my_rank + i;
    }

    vector<double> global_data = send_and_receive(local_data, n, numprocs, my_rank);

    MPI_Finalize();

    return 0;
}

I compile using

mpic++ main.cpp

and run using

mpirun -n 2 a.out

When I run with n = 15000 the program completes successfully, but with n = 17000 or n = 20000 it never finishes, and the two CPU's sit at 100% until I force close the program.

Does anyone know what the problem could be?


回答1:


MPI_Send is a funny call. If there is enough internal buffers to store the input, it may return - the only guarantee it makes is that input buffer is not going to be needed further by MPI. However, if there isn't enough internal buffer space, the call will block until the opposite MPI_Recv call begins to receive data. See where this is going? Both processes post MPI_Send that block due to insufficient buffer space. When debugging issues like that, it helps to replace MPI_Send with MPI_Ssend.

Your possible solutions are:

  • Use buffered send, MPI_Bsend.
  • Use MPI_Sendrecv
  • Alternate send/recv pair so that each send has a matching recv (e.g. odd proc sends, even recvs, then vice versa).
  • Use non-blocking send, MPI_Isend

See http://www.netlib.org/utk/papers/mpi-book/node39.html



来源:https://stackoverflow.com/questions/18746553/sending-large-stdvector-using-mpi-send-and-mpi-recv-doesnt-complete

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!