问题
Installation of hadoop on single node cluster , any idea why do we need to create the following
Why do we need SSH access for a new user ..?
Why should it be able to connect to its own user account?
Why should i specify a password less for a new user ..?
When all the nodes are in same machine, why do they are communicating explicitly ..?
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
回答1:
Why do we need SSH access for a new user ..?
Because you want to communicate to the user who is running Hadoop daemons. Notice that ssh is actually from a user(on one machine) to another user(on a another machine), and not just machine to machine.
Why should it be able to connect to its own user account?
Because you want to start all the daemons by just one command. Otherwise you have to start the daemons individually, by issuing commands for each daemon. ssh is required for this, even if you are on a single machine.
Why should i specify a password less for a new user ..?
Because you don't want to enter the password everytime you start your Hadoop daemons. That would be irritating, right?
When all the nodes are in same machine, why do they are communicating explicitly ..?
What do you mean by explicitly? Remember, ssh is not for the communication between the processes. All the communication happens over TCP/IP. ssh is required by the Hadoop scripts so that you can start all the daemons from one machines without having to go at each machine and start each process separately over there.
HTH
回答2:
It's not mandatory that your setup password-less ssh among nodes or local machine. Hadoop mainly uses http for data transfers across nodes when required.
password-less ssh access are required (among nodes) so that your start-all.sh, start-dfs.sh and start-mapred.sh scripts (as far as can I remember), can be used to start stop the Hadoop daemons in a distributed cluster environment. Otherwise, it can go cumbersome, to go into every machine and start/stop the Hadoop daemons.
You can also use, hadoop-daemons.sh or hadoop-daemon.sh to accomplish the same things logging into as your hadoop user.
Cloudera Hadoop Distribution don't even uses those script and provides init.d scripts to do the starting/stopping of Hadoop daemons.
回答3:
slaves.sh
is used to start the remote nodes:
for slave in `cat "$HOSTLIST"|sed "s/#.*$//;/^$/d"`; do
ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
2>&1 | sed "s/^/$slave: /" &
if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
sleep $HADOOP_SLAVE_SLEEP
fi
done
It has a dependency on ssh
, as you see. While you can do the entire tutorial w/o requiring a new user and ssh config, I would guess that, as a tutorial, it would not give you a good start for when you have to deploy/configure/start/stop a real cluster (ie. remote nodes). As @JteRocker points out distributions like Cloudera use other scripts to start/stop daemons (but I would guess they still depend on ssh), and a distribution like Hortonworks' Hadoop on Windows would use yet another mechanisms (ie. PowerShell and WinRM instead of ssh).
回答4:
Use this commnad
$Sudo addgroup hadoop if not working then
$sudo adduser --ingroup hadoop hduser
来源:https://stackoverflow.com/questions/17805431/new-user-ssh-hadoop