Why isn't Hadoop implemented using MPI?

前端 未结 6 822
一个人的身影
一个人的身影 2021-01-30 01:41

Correct me if I\'m wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.

What are the technical reasons for this?

<
6条回答
  •  面向向阳花
    2021-01-30 02:30

    If we just look at the Map / Reduce steps and scheduling part of Hadoop, then I would argue MPI is a much better methodology / technology. MPI supports many different exchange patterns like broadcast, barrier, gather all, scatter / gather (or call it map-reduce). But Hadoop also has the HDFS. With this, the data can sit much closer to the processing nodes. And if you look at the problem space Hadoop-like technologies where used for, the outputs of the reduction steps were actually fairly large, and you wouldn't want to have all that information swamp your network. That's why Hadoop saves everything to disk. But the control messages could have used MPI, and the MPI messages could just have pointers (urls or file handles) to the actual data on disk ...

提交回复
热议问题