namenode ha failover time

Namenode HA (NFS, QJM) is available in hadoop 2.x (HDFS-1623). It provides fast failover for Namenode, but I can't find any description on how long does it take to recover from a failure. Can any one tell me?

Thanks for your answer.As the matter of fact,I want to know the time between the transformation of two nodes(active namenode and standby namenode).can you tell me how long?

Here are some qualified examples of times for failover with a standby NameNode:

A 60 node cluster with 6 million blocks using 300TB raw storage, and 100K files: 30 seconds. Hence total failover time ranges from 1-3 minutes.

A 200 node cluster with 20 million blocks occupying 1PB raw storage and 1 million files: 110 seconds. Hence total failover time ranges from 2.5 to 4.5 minutes.

For small to medium clusters cold failover is only 30 to 120 seconds slower.

From: http://hortonworks.com/blog/ha-namenode-for-hdfs-with-hadoop-1-0-part-1/

From the Hadoop : The Definitive Guide, I believe this is easily understandable and pretty straight forward.
Failover and fencing

The transition from the active namenode to the standby is managed by a new entity in the system called the failover controller. Failover controllers are pluggable, but the first implementation uses ZooKeeper to ensure that only one namenode is active. Each namenode runs a lightweight failover controller process whose job it is to monitor its namenode for failures (using a simple heartbeating mechanism) and trigger a failover should a namenode fail.

Failover may also be initiated manually by an adminstrator, in the case of routine maintenance, for example. This is known as a graceful failover, since the failover controller arranges an orderly transition for both namenodes to switch roles.

In the case of an ungraceful failover, however, it is impossible to be sure that the failed namenode has stopped running. For example, a slow network or a network partition can trigger a failover transition, even though the previously active namenode is still running, and thinks it is still the active namenode. The HA implementation goes to great lengths to ensure that the previously active namenode is prevented from doing any damage and causing corruption—a method known as fencing. The system employs a range of fencing mechanisms, including killing the namenode’s process, revoking its access to the shared storage directory (typically by using a vendor-specific NFS com- mand), and disabling its network port via a remote management command. As a last resort, the previously active namenode can be fenced with a technique rather graphi- cally known as STONITH, or “shoot the other node in the head”, which uses a speci- alized power distribution unit to forcibly power down the host machine.

Client failover is handled transparently by the client library. The simplest implemen- tation uses client-side configuration to control failover. The HDFS URI uses a logical hostname which is mapped to a pair of namenode addresses (in the configuration file), and the client library tries each namenode address until the operation succeeds.

Hope it helps!

Fast fail-over means not a recovery but fail-over to other namenode
Ha namenode is configure with multiple namenodes
If any one namenode become failure then other namenode will become active.
If the failure namenode again started means, it will be in standby state.

When you are using HA, multiple namenode cluster will run but journal node will write only on a single name node at once. So One name node will be in active state and another one will be in standby
If one namenode fails then standby node will transist into active state. It is called recover from failure.

来源：https://stackoverflow.com/questions/27266267/namenode-ha-failover-time

标签

Hadoop

HDFS

high-availability

failover