Intel MPI mpirun does not terminate using java Process.destroy()

别说谁变了你拦得住时间么 提交于 2019-12-11 05:29:10

问题


My Intel MPI version is impi/5.0.2.044/intel64 installed on a RHEL machine.

I am using java to invoke an MPI program using the following code:

ProcessBuilder builder = new ProcessBuilder();
builder.command("mpirun ./myProgram");
builder.redirectError(Redirect.to(new File("stderr")));
builder.redirectOutput(Redirect.to(new File("stdout")));
Process p = null;
try {
    p = builder.start();
} catch (IOException e) {
    e.printStackTrace();
}
// Process has started here
p.destroy();
try {
    // i = 143
    int i = p.exitValue();
} catch( IllegalThreadStateException e){
}

But even after the exitValue() is known without throwing exception, ps aux still shows a bunch of ./myProgram, and the program is still writing result files as if it is not being killed, terminating only after it finishes all its calculation.

Currently, the only way I find successful to terminate ./myProgram is to terminate the java using Ctrl+C in the console to the java program.

My intention is to stop the calculation immediately and let the java program schedule some other calculation. Is there any walkaround to force all mpi instances to terminate, or at least guarantee a termination in small, definite amount of time (e.g. 30s or 1 min of polling)?


回答1:


The problem is that the JDK implementation of destroy sends SIGTERM, which shuts down mpirun hard. See here for the relevant JDK source.

You need to send SIGINT to give MPI a chance to shut down gracefully.

E.g. Runtime.getRuntime().exec("kill -9 <pid>");

You can get the PID by invoking mpirun with --report-pid. (read the man-page)

edit

You can alternatively use reflection to figure out the PID of a process you started under a UNIX-like OS (stolen from here). As we are talking about kill and signal, that should not be a restriction.

if(process.getClass().getName().equals("java.lang.UNIXProcess")) {
  /* get the PID on unix/linux systems */
  try {
    Field f = process.getClass().getDeclaredField("pid");
    f.setAccessible(true);
    pid = f.getInt(p);
  } catch (Throwable e) {
  }
}


来源:https://stackoverflow.com/questions/32222878/intel-mpi-mpirun-does-not-terminate-using-java-process-destroy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!