所有的节点都必须做的:(NameNode,DataNode)
1 需要知道hadoop依赖Java和SSH
- Java 1.5.x (以上),必须安装。安装目录为/usr/java/jdk1.7.0
1 下载合适的jdk
//此文件为64Linux 系统使用的 RPM包
http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm
2 安装jdk
rpm -ivh jdk-7-linux-x64.rpm
3 验证java
[root@hadoop1 ~]# java -version
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
[root@hadoop1 ~]# ls /usr/java/
default jdk1.7.0 latest
4 配置java环境变量
#vim /etc/profile //在profile文件中加入如下信息:
#add for hadoop
export JAVA_HOME=/usr/java/jdk1.7.0
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/
export PATH=$PATH:$JAVA_HOME/bin
//使环境变量生效
source /etc/profile
5 拷贝 /etc/profile 到 datanode
- ssh 必须安装并且保证 sshd 一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
- 检验是否安装了SSH,执行命令:
which ssh
which sshd
which ssh-keygen
如果上面三个命令的返回都不是空,则证明SSH已经安装好。
2 建立 Hadoop 公共帐号
- 所有的节点应该具有相同的用户名,可以使用如下命令添加:
- useradd hadoop
- passwd hadoop
3配置 host 主机名
tail -n 3 /etc/hosts
192.168.57.75 namenode
192.168.57.76 datanode1
192.168.57.78 datanode2
192.168.57.79 datanode3
在NameNode节点中需要进行的:
1.生成ssh密钥对。
在NameNode上执行下面的命令,生成RSA密钥对:
执行命令:
ssh-keygen -t rsa
下面是从别的地方摘抄过来的:
大家可以配置成密论认证的方式
首先生成密钥,用命令ssh-keygen –t rsa
运行后可以一直空格,生成密钥,id_rsa和id_rsa.pub文件 ,默认放在/root/.ssh/下,.ssh文件是隐藏的,要显示隐藏文件才看得到
在/home/admin下创建.ssh活页夹,把id_rsa.pub文件copy 到/home/admin/.ssh活页夹下,改变文件名为authorized_keys
把id_rsa 文件copy 到一个目录如/home/id_rsa
用下面的命令测试配好了没:
ssh -i /home/id_rsa admin@localhost
应该不用密码就直接进去了~
2.查看生成的公钥:
more /home/root/.ssh/id_rsa.pub
3.将公钥复制到各个从节点上。
1.在主节点上运行命令:scp /home/root/.ssh/id_rsa.pub hadoop_dataNode@ip地址: ~/master_key 将生成的公钥文件从NameNode上面复制到DataNode中的 "~/master_key"文件。
2.在从节点DataNode之上设置该文件为授权密钥:
mkdir ~/.ssh
chmod 700 ~/.ssh
mv ~/master_key ~/.ssh/authrized_keys
chmod 600 ~/.ssh/authrized_keys
4.从主节点上访问从节点: ssh ip地址
hadoop配置(这个需要在所有的节点上配置,除了一些特殊的命令,有标注)
hadoop 配置
//注意使用hadoop 用户 操作
1 配置目录
[hadoop@hadoop1 ~]$ pwd
/home/hadoop
[hadoop@hadoop1 ~]$ ll
total 59220
lrwxrwxrwx 1 hadoop hadoop 17 Feb 1 16:59 hadoop -> hadoop-0.20.203.0
drwxr-xr-x 12 hadoop hadoop 4096 Feb 1 17:31 hadoop-0.20.203.0
-rw-r--r-- 1 hadoop hadoop 60569605 Feb 1 14:24 hadoop-0.20.203.0rc1.tar.gz
2 配置hadoop-env.sh,指定java位置
vim hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0
3 配置core-site.xml //定位文件系统的 namenode
[hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode的ip地址:9000</value>
</property>
</configuration>
hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。它默认的位置是在/tmp/{$user}下面,但是在/tmp路径下的存储是不安全的,因为linux一次重启,文件就可能被删除。
编辑conf/core-site.xml,在里面加上如下属性:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/had/hadoop/data</value>
<description>A base for other temporary directories.</description>
</property>
4 配置mapred-site.xml //定位jobtracker 所在的主节点 (其实这个是map-reduce的缩写)
[hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>namenode:9001</value>
</property>
</configuration>
5 配置hdfs-site.xml //配置HDFS副本数量
[hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
6 配置 master 与 slave 配置文档
[hadoop@hadoop1 ~]$ cat hadoop/conf/masters
namenode
[hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
datanode1
datanode2
7 拷贝hadoop 目录到所有节点(datanode)
[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/
[hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
8 格式化 HDFS (这个只需要在NameNode上进行设置,并且最好设置一下)
[hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop1.test.com/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y //这里输入Y
12/02/02 11:31:17 INFO util.GSet: VM type = 64-bit
12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
12/02/02 11:31:17 INFO util.GSet: capacity = 2^21 = 2097152 entries
12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152
12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
************************************************************/
[hadoop@hadoop1 hadoop]$
9 启动hadoop 守护进程
[hadoop@hadoop1 hadoop]$ bin/start-all.sh
starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
10 验证
//namenode
[hadoop@hadoop1 logs]$ jps
2883 JobTracker
3002 Jps
2769 NameNode
//datanode
[hadoop@hadoop2 ~]$ jps
2743 TaskTracker
2670 DataNode
2857 Jps
[hadoop@hadoop3 ~]$ jps
2742 TaskTracker
2856 Jps
2669 DataNode
[hadoop@hadoop4 ~]$ jps
2742 TaskTracker
2852 Jps
2659 DataNode
Hadoop 监控web页面
http://NameNode的ip地址:50070/dfshealth.jsp
来源:https://www.cnblogs.com/lxzh/archive/2013/04/08/3008319.html