问题
I am setting up Hadoop 2.7.3 cluster on EC2 servers - 1 NameNode, 1 Secondary NameNode and 2 DataNodes.
Hadoop core uses SSH for communication with slaves to launch the processes on the slave node.
- Do we need to have same SSH keys on all the nodes for the hadoop user?
- What is the best practice/ideal way to copy or add the NameNode to Slave nodes SSH credentials?
回答1:
Do we need to have same SSH keys on all the nodes for the hadoop user?
- The same public key needs to be on all of the nodes
What is the best practice/ideal way to copy or add the NameNode to Slave nodes SSH credentials?
Per documentation:
Namenode: Password Less SSH
Password-less SSH between the name nodes and the data nodes. Let us create a public-private key pair for this purpose on the namenode.
namenode> ssh-keygen
Use the default (
/home/ubuntu/.ssh/id_rsa
) for the key location and hit enter for an empty passphrase.
Datanodes: Setup Public Key
The public key is saved in
/home/ubuntu/.ssh/id_rsa.pub
. We need to copy this file from the namenode to each data node and append the contents to /home/ubuntu/.ssh/authorized_keys on each data node.
datanode1> cat id_rsa.pub >> ~/.ssh/authorized_keys
datanode2> cat id_rsa.pub >> ~/.ssh/authorized_keys
datanode3> cat id_rsa.pub >> ~/.ssh/authorized_keys
Namenode: Setup SSH Config
SSH uses a configuration file located at ~/.ssh/config for various parameters. Set it up as shown below. Again, substitute each node’s Public DNS for the HostName parameter (for example, replace with EC2 Public DNS for NameNode).
Host nnode
HostName <nnode>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode1
HostName <dnode1>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode2
HostName <dnode2>
User ubuntu
IdentityFile ~/.ssh/id_rsa
Host dnode3
HostName <dnode3>
User ubuntu
IdentityFile ~/.ssh/id_rsa
At this point, verify that password-less operation works on each node as follows (the first time, you will get a warning that the host is unknown and whether you want to connect to it. Type yes and hit enter. This step is needed once only):
namenode> ssh nnode
namenode> ssh dnode1
namenode> ssh dnode2
namenode> ssh dnode3
来源:https://stackoverflow.com/questions/51822209/hadoop-cluster-hadoop-user-ssh-communication