Hadoop Datanode, namenode, secondary-namenode, job-tracker and task-tracker

后端 未结 3 951
青春惊慌失措
青春惊慌失措 2021-01-31 06:31

I am new in hadoop so I have some doubts. If the master-node fails what happened the hadoop cluster? Can we recover that node without any loss? Is it possible to keep a secondar

3条回答
  •  梦谈多话
    2021-01-31 06:57

    Currently hadoop cluster has a single point of failure which is namenode.

    And about the secondary node isssue (from apache wiki) :

    The term "secondary name-node" is somewhat misleading. It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node, and in no event it can replace the primary name-node in case of its failure.

    The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary name-node periodically downloads current name-node image and edits log files, joins them into new image and uploads the new image back to the (primary and the only) name-node. See User Guide.

    So if the name-node fails and you can restart it on the same physical node then there is no need to shutdown data-nodes, just the name-node need to be restarted. If you cannot use the old node anymore you will need to copy the latest image somewhere else. The latest image can be found either on the node that used to be the primary before failure if available; or on the secondary name-node. The latter will be the latest checkpoint without subsequent edits logs, that is the most recent name space modifications may be missing there. You will also need to restart the whole cluster in this case.

    There are tricky ways to overcome this single point of failure. If you are using cloudera distribution, one of the ways explained here. Mapr distribution has a different way to handle to this spof.

    Finally, you can use every single programing language to write map reduce over hadoop streaming.

提交回复
热议问题