Why does this MPI code execute out of order? [duplicate]

依然范特西╮ 提交于 2019-12-25 16:49:24

问题


I'm trying to create a "Hello, world!" application in (Open)MPI such that each process will print out in order.

My idea was to have the first process send a message to the second when it's finished, then the second to the third, etc.:

#include <mpi.h>
#include <stdio.h>

int main(int argc,char **argv) {

    int rank, size;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // See: http://mpitutorial.com/mpi-send-and-receive/
    if (rank == 0) {
        // This is the first process.
        // Print out immediately.
        printf("Hello, World! I am rank %d of %d.\n", rank, size);
        MPI_Send(&rank, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
    } else {
        // Wait until the previous one finishes.
        int receivedData;
        MPI_Recv(&receivedData, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        printf("Hello, World! I am rank %d of %d (message: %d).\n", rank, size, receivedData);
        if (rank + 1 < size) {
            // We're not the last one. Send out a message.
            MPI_Send(&rank, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);
        } else {
            printf("Hello world completed!\n");
        }
    }

    MPI_Finalize();
    return 0;
}

When I run this on an eight-core cluster, it runs perfectly every time. However, when I run it on a sixteen-core cluster, sometimes it works, and sometimes it outputs something like this:

Hello, world, I am rank 0 of 16.
Hello, world, I am rank 1 of 16 (message: 0).
Hello, world, I am rank 2 of 16 (message: 1).
Hello, world, I am rank 3 of 16 (message: 2).
Hello, world, I am rank 4 of 16 (message: 3).
Hello, world, I am rank 5 of 16 (message: 4).
Hello, world, I am rank 6 of 16 (message: 5).
Hello, world, I am rank 7 of 16 (message: 6).
Hello, world, I am rank 10 of 16 (message: 9).
Hello, world, I am rank 11 of 16 (message: 10).
Hello, world, I am rank 8 of 16 (message: 7).
Hello, world, I am rank 9 of 16 (message: 8).
Hello, world, I am rank 12 of 16 (message: 11).
Hello, world, I am rank 13 of 16 (message: 12).
Hello, world, I am rank 14 of 16 (message: 13).
Hello, world, I am rank 15 of 16 (message: 14).
Hello world completed!

That is, most of the output is in order, but some is out of place.

Why is this happening? How is it even possible? How can I fix it?


回答1:


MPI codes are not guaranteed to complete in any specific order. This is especially true when running on multiple nodes, but still true even on one node.

While you are enforcing some sort of ordering by adding the sequential sends and receives, the output messages are still forwarded from the application process to the MPI layer and back up to the mpiexec/mpirun process to be printed to the screen. This message forwarding can happen in any order and is interleaved with other communication (since it uses a completely different communication topology). If you really must ensure that messages are printed in order, you have to make sure that the same MPI rank prints all of them out.



来源:https://stackoverflow.com/questions/24534645/why-does-this-mpi-code-execute-out-of-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!