Block execution until children called via MPI_Comm_spawn have finished

问题

I'm in the process of modifying an existing application, where I would like to spawn a dynamically created bash script. I created a simple wrapper routine which takes the name of the bash script as an argument. In the wrapper, the script is spawned by MPI_Comm_spawn. Directly after, the wrapper calls MPI_Finalize, which is executed before the scripts have finished:

#include "mpi.h"
#include <stdlib.h>
#include <iostream>

using namespace std;

int main(int argc, char *argv[])
{
    char *script = argv[1];
    int maxProcs = 2, myRank;
    MPI_Comm childComm;
    int spawnError[maxProcs];

    // Initialize
    argv[1] = NULL;
    MPI_Init(&argc, &argv);

    // Rank of parent process
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);    

    // Spawn application    
    MPI_Comm_spawn(script, MPI_ARGV_NULL, maxProcs, MPI_INFO_NULL, myRank, MPI_COMM_SELF, &childComm, spawnError);

    // Finalize
    MPI_Finalize();

    return EXIT_SUCCESS;
}

If I insert

    sleep(10);

right before

    MPI_Finalize ();

everything works fine. Now my question is if it is possible to block execution in the wrapper until the bash script is finished? Also, it would be nice to obtain the return value of the script. Unfortunately, it is not an option to create another wrapper for the script, which communicates with the parent wrapper and executes the bash scripts via a system call because I need to access MPI environment variables from within the script. I hope, I have made things clear enough. Any help would be greatly appreciated!

回答1:

If you have control over the content of the bash script, i.e. if you can put something into it before the spawn, then a very crude option would be to write a special MPI program that contains a single MPI_Barrier line:

#include <mpi.h>

int main (int argc, char **argv)
{
   MPI_Comm parent;

   MPI_Init(&argc, &argv);

   // Obtain an intercommunicator to the parent MPI job
   MPI_Comm_get_parent(&parent);

   // Check if this process is a spawned one and if so enter the barrier
   if (parent != MPI_COMM_NULL)
      MPI_Barrier(parent);

   MPI_Finalize();

   return 0;
}

Compile the program as any other MPI program with the same MPI distribution as the one used by the main MPI program and call it something like waiter. Then set an EXIT trap at the very beginning of your bash script:

#!/bin/bash
trap "/path/to/waiter $*" EXIT
...
# End of the script file

Also modify the main program to read:

// Spawn application    
MPI_Comm_spawn(script, MPI_ARGV_NULL, maxProcs, MPI_INFO_NULL, myRank, MPI_COMM_SELF, &childComm, spawnError);

// Wait for the waiters to enter the barrier
MPI_Barrier(childComm);

// Finalize
MPI_Finalize();

It is important that waiter is called like waiter $* inside the trap so it can receive all command line arguments that the bash script would receive since some old MPI implementations append additional arguments to the spawned executable in order to provide it with parent connectivity information. MPI-2 compliant implementations usually provide this information via the environment in order to support MPI_Init(NULL, NULL).

The way this works is pretty simple: the trap command instructs the shell to execute waiter whenever the script exits. waiter itself simply establishes an intercommunicator with the parent MPI job and waits on the barrier. Once all spawned scripts have completed, all of them start the waiter process as part of the exit trap and the barrier will be lifted.

If you cannot modify the script, then just create a wrapper script that calls the actual script and put the waiter in the wrapper.

Tested and works with Open MPI and Intel MPI.

回答2:

There isn't a way to make MPI_COMM_SPAWN block that I know of and the usual solution here would be to have an MPI_BARRIER between the spawner and the spawnees. Unfortunately here, you're not following the usual model where an MPI application spawns another MPI application. Instead, you're just running a bunch of scripts. To get the results you want, you may have to use something other than MPI or figure out a way to write an MPI wrapper for you remote bash scripts.

回答3:

Why don't you have instead a children MPI application that actually executes the script with a fork plus exec. The script name can be passed as a parameter to the children created with MPI_Comm_spawn or MPI_Comm_spawn_multiple. These children then wait for the scripts to complete by doing a wait, or if there was an error, by handling SIGCHLD. After the script completes, you can enter a barrier between the parents and children MPI processes and then terminate by calling MPI_Finalize.

The child program will be similar to the one presented by Hristo Iliev:

#include <mpi.h>

int main (int argc, char **argv){
   MPI_Comm parent;
   MPI_Init(&argc, &argv);
   MPI_Comm_get_parent(&parent);

   pid = fork();  
   if (pid < 0) { // error while forking
       exit (-1);
   } else if (pid == 0) { // child
       execvp(<nome of the script parsed from parameters in argv or other means>);
   } else { // parent
       wait(<pid of the child>); // there are non-blocking alternatives if needed
   }

   if (parent != MPI_COMM_NULL){
      MPI_Barrier(parent);
   }

   MPI_Finalize();

   return 0;
}

The parent program simply issues the spawn (if there is a single script name) or spawn_multiple (if you will have different script names per spawned MPI process) and then makes the barrier with the spawned children's inter-communicator that is an output parameter of the MPI spawn operations.

来源：https://stackoverflow.com/questions/17950762/block-execution-until-children-called-via-mpi-comm-spawn-have-finished

标签

c++

mpi