Correct me if I\'m wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.
What are the technical reasons for this?
<
The truth is Hadoop could be implemented using MPI. MapReduce has been used via MPI for as long as MPI has been around. MPI has functions like 'bcast' - broadcast all data, 'alltoall' - send all data to all nodes, 'reduce' and 'allreduce'. Hadoop removes the requirement to explicitly implement your data distribution and gather your result methods by packaging an outgoing communication command with a reduce command. The upside is you need to make sure your problem fits the 'reduce' function before you implement Hadoop. It could be your problem is a better fit for 'scatter'/'gather' and you should use Torque/MAUI/SGE with MPI instead of Hadoop. Finally, MPI does not write your data to disk as described in another post, unless you follow your receive method with a write to disk. It works just as Hadoop does by sending your process/data somewhere else to do the work. The important part is to understand your problem with enough detail to be sure MapReduce is the most efficient parallelization strategy, and be aware that many other strategies exist.