问题
My Intel MPI version is impi/5.0.2.044/intel64 installed on a RHEL machine.
I am using java to invoke an MPI program using the following code:
ProcessBuilder builder = new ProcessBuilder();
builder.command("mpirun ./myProgram");
builder.redirectError(Redirect.to(new File("stderr")));
builder.redirectOutput(Redirect.to(new File("stdout")));
Process p = null;
try {
p = builder.start();
} catch (IOException e) {
e.printStackTrace();
}
// Process has started here
p.destroy();
try {
// i = 143
int i = p.exitValue();
} catch( IllegalThreadStateException e){
}
But even after the exitValue()
is known without throwing exception, ps aux
still shows a bunch of ./myProgram
, and the program is still writing result files as if it is not being killed, terminating only after it finishes all its calculation.
Currently, the only way I find successful to terminate ./myProgram
is to terminate the java using Ctrl+C
in the console to the java program.
My intention is to stop the calculation immediately and let the java program schedule some other calculation. Is there any walkaround to force all mpi instances to terminate, or at least guarantee a termination in small, definite amount of time (e.g. 30s or 1 min of polling)?
回答1:
The problem is that the JDK implementation of destroy
sends SIGTERM
, which shuts down mpirun
hard.
See here for the relevant JDK source.
You need to send SIGINT
to give MPI a chance to shut down gracefully.
E.g. Runtime.getRuntime().exec("kill -9 <pid>");
You can get the PID by invoking mpirun
with --report-pid
. (read the man-page)
edit
You can alternatively use reflection to figure out the PID of a process you started under a UNIX-like OS (stolen from here). As we are talking about kill and signal, that should not be a restriction.
if(process.getClass().getName().equals("java.lang.UNIXProcess")) {
/* get the PID on unix/linux systems */
try {
Field f = process.getClass().getDeclaredField("pid");
f.setAccessible(true);
pid = f.getInt(p);
} catch (Throwable e) {
}
}
来源:https://stackoverflow.com/questions/32222878/intel-mpi-mpirun-does-not-terminate-using-java-process-destroy