MPI hangs on MPI_Send for large messages

后端 未结 1 1839
时光取名叫无心
时光取名叫无心 2020-12-03 18:46

There is a simple program in c++ / mpi (mpich2), which sends an array of type double. If the size of the array more than 9000, then during the call MPI_Send my programm hang

相关标签:
1条回答
  • 2020-12-03 18:59

    The details of the Cube class aren't relevant here: consider a simpler version

    #include <mpi.h>
    #include <cstdlib>
    
    using namespace std;
    
    int main(int argc, char *argv[]) {
        int size, rank;
        const int root = 0;
    
        int datasize = atoi(argv[1]);
    
        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
        if (rank != root) {
            int nodeDest = rank + 1;
            if (nodeDest > size - 1) {
                nodeDest = 1;
            }
            int nodeFrom = rank - 1;
            if (nodeFrom < 1) {
                nodeFrom = size - 1;
            }
    
            MPI_Status status;
            int *data = new int[datasize];
            for (int i=0; i<datasize; i++)
                data[i] = rank;
    
            cout << "Before send" << endl;
            MPI_Send(&data, datasize, MPI_INT, nodeDest, 0, MPI_COMM_WORLD);
            cout << "After send" << endl;
            MPI_Recv(&data, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status);
    
            delete [] data;
    
        }
    
        MPI_Finalize();
        return 0;
    }
    

    where running gives

    $ mpirun -np 4 ./send 1
    Before send
    After send
    Before send
    After send
    Before send
    After send
    $ mpirun -np 4 ./send 65000
    Before send
    Before send
    Before send
    

    If in DDT you looked at the message queue window, you'd see everyone is sending, and no one is receiving, and you have a classic deadlock.

    MPI_Send's semantics, wierdly, aren't well defined, but it is allowed to block until "the receive has been posted". MPI_Ssend is clearer in this regard; it will always block until the receive has been posted. Details about the different send modes can be seen here.

    The reason it worked for smaller messages is an accident of the implementation; for "small enough" messages (for your case, it looks to be <64kB), your MPI_Send implementation uses an "eager send" protocol and doesn't block on the receive; for larger messages, where it isn't necessarily safe just to keep buffered copies of the message kicking around in memory, the Send waits for the matching receive (which it is always allowed to do anyway).

    There's a few things you could do to avoid this; all you have to do is make sure not everyone is calling a blocking MPI_Send at the same time. You could (say) have even processors send first, then receive, and odd processors receive first, and then send. You could use nonblocking communications (Isend/Irecv/Waitall). But the simplest solution in this case is to use MPI_Sendrecv, which is a blocking (Send + Recv), rather than a blocking send plus a blocking receive. The send and receive will execute concurrently, and the function will block until both are complete. So this works

    #include <mpi.h>
    #include <cstdlib>
    
    using namespace std;
    
    int main(int argc, char *argv[]) {
        int size, rank;
        const int root = 0;
    
        int datasize = atoi(argv[1]);
    
        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
        if (rank != root) {
            int nodeDest = rank + 1;
            if (nodeDest > size - 1) {
                nodeDest = 1;
            }
            int nodeFrom = rank - 1;
            if (nodeFrom < 1) {
                nodeFrom = size - 1;
            }
    
            MPI_Status status;
            int *outdata = new int[datasize];
            int *indata  = new int[datasize];
            for (int i=0; i<datasize; i++)
                outdata[i] = rank;
    
            cout << "Before sendrecv" << endl;
            MPI_Sendrecv(outdata, datasize, MPI_INT, nodeDest, 0,
                         indata, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status);
            cout << "After sendrecv" << endl;
    
            delete [] outdata;
            delete [] indata;
        }
    
        MPI_Finalize();
        return 0;
    }
    

    Running gives

    $ mpirun -np 4 ./send 65000
    Before sendrecv
    Before sendrecv
    Before sendrecv
    After sendrecv
    After sendrecv
    After sendrecv
    
    0 讨论(0)
提交回复
热议问题