发表新帖

发表新帖

Why isn't Hadoop implemented using MPI?

前端未结

关注

 6  822

一个人的身影 2021-01-30 01:41

Correct me if I\'m wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.

What are the technical reasons for this?

<

6条回答

面向向阳花 (楼主)

2021-01-30 02:30

If we just look at the Map / Reduce steps and scheduling part of Hadoop, then I would argue MPI is a much better methodology / technology. MPI supports many different exchange patterns like broadcast, barrier, gather all, scatter / gather (or call it map-reduce). But Hadoop also has the HDFS. With this, the data can sit much closer to the processing nodes. And if you look at the problem space Hadoop-like technologies where used for, the outputs of the reduction steps were actually fairly large, and you wouldn't want to have all that information swamp your network. That's why Hadoop saves everything to disk. But the control messages could have used MPI, and the MPI messages could just have pointers (urls or file handles) to the actual data on disk ...

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题