问题
Which MPI implementations currently have support for fault tolerance, and what is the state of their development?
回答1:
This question is probably too broad to give you a good answer here, especially since the answer will change as time progresses.
In general, there's lots of fault tolerant work going on with various MPI implementations that is in various states of support.
- FT-MPI is an old project that isn't really in development anymore, but somewhat started it all in terms of integrated FT within the MPI library.
- ULFM is a spiritual successor to FT-MPI that's currently being proposed for inclusion in the future MPI Standard which means eventually every MPI implementation will provide it (if it is accepted). There's currently and implementation in an old branch of Open MPI and an implementation in MPICH is currently in progress for a future release.
There's lot of other MPI libraries that implement some form of fault tolerance on top of MPI or make some sort of tweaks to the implementation itself. These are just a couple of options.
来源:https://stackoverflow.com/questions/23812513/fault-tolerant-mpi-implementations-status