【Hadoop】- Hadoop1.x 完全分布式环境搭建

主宰稳场 提交于 2019-11-29 06:20:46

环境: 1台NameNode服务器,2台DataNode服务器

安装步骤

①:配置/etc/hosts文件:实现集群内部的DNS解析,无需查询DNS服务器,当访问远程主机时首先查询hosts文件是否有配置,如果配置则直接按照指定的IP直接访问远程主机(实际规模较大的hadoop集群一般会配置DNS服务器进行统一管理)

修改linux主机的主机名:/etc/sysconfig/network文件的HOSTNAME字段的值即可(注意重启才可永久生效) hostname newName:重启之后就会失效

hosts文件:注意每个节点最好共享同1份hosts文件

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.174.142   NameNode
192.168.174.143   DataNode_01
192.168.174.145   DataNode_02

测试hosts文件:

[squirrel@DataNode_02 ~]\$ ping DataNode_01
PING DataNode_01 (192.168.174.143) 56(84) bytes of data.
64 bytes from DataNode_01 (192.168.174.143): icmp_seq=1 ttl=64 time=2.24 ms
--- DataNode_01 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6589ms
rtt min/avg/max/mdev = 0.275/0.733/2.241/0.624 ms

[squirrel@DataNode_02 ~]\$ ping DataNode_02
PING DataNode_02 (192.168.174.145) 56(84) bytes of data.
64 bytes from DataNode_02 (192.168.174.145): icmp_seq=1 ttl=64 time=0.029 ms
--- DataNode_02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2381ms
rtt min/avg/max/mdev = 0.029/0.050/0.062/0.016 ms

结论:日志显示可以ping主机名,表明hosts文件配置没问题

②:配置hadoop核心配置文件:hadoop-env.sh、 core-site.xml、hdfs-site.xml、 mapred-site.xml

hadoop-env.sh:(jdk安装目录配置)

export JAVA_HOME=/usr/local/java/jdk1.8.0_112

core-site.xml: 注意:名称节点位置需要实际的节点主机名或IP地址

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://NameNode:9000</value>
</property>
</configuration>

hdfs-site.xml:

注意数据块存放目录如果不存在,DataNode节点不会启动DataNode守护进程

说明:因为配置两台DataNode节点,数据块备份2份

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
 <name>dfs.data.dir</name>
 <value>/home/squirrel/Programme/hadoop-0.20.2/data</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
</configuration>

mapred-site.xml:

注意:任务追踪器的位置需改为实际的主机名或IP地址

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
 <name>mapred.job.tracker</name>
 <value>NameNode:9001</value>
</property>
</configuration>

③:配置masters和slaves文件

注意:文件里面只需每行之名服务器的主机名或IP地址即可

masters文件: mater节点:NameNode/SecondaryNameNode/JobTracker

NameNode

slaves文件: slave节点:DataNode/TaskTracker

DataNode_01
DataNode_02

④:将hadoop配置好的文件全部共享给hadoop集群节点服务器

scp -r  
/home/squirrel/Programme/hadoop-0.20.2  DataNode_01:/home/squirrel/Programme/hadoop-0.20.2
scp -r  
/home/squirrel/Programme/hadoop-0.20.2  DataNode_02:/home/squirrel/Programme/hadoop-0.20.2

⑤:格式化HDFS文件系统:hadoop namenode -format

16/12/28 23:23:13 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = NameNode/192.168.174.142
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/

16/12/28 23:23:15 INFO namenode.FSNamesystem: fsOwner=squirrel,squirrel
16/12/28 23:23:15 INFO namenode.FSNamesystem: supergroup=supergroup
16/12/28 23:23:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/12/28 23:23:15 INFO common.Storage: Image file of size 98 saved in 0 seconds.
16/12/28 23:23:15 INFO common.Storage: Storage directory /tmp/hadoop-squirrel/dfs/name has been successfully formatted.
16/12/28 23:23:15 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at NameNode/192.168.174.142
************************************************************

分析:

"/tmp/hadoop-squirrel/dfs/name has been successfully formatted."打印日志表明HDFS文件系统格式化成功。

⑥:启动Hadoop:在hadoop解压目录bin下执行./start-all.sh

starting namenode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-namenode-NameNode.out

DataNode_01: starting datanode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-datanode-DataNode_01.out

DataNode_02: starting datanode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-datanode-DataNode_02.out

NameNode: starting secondarynamenode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-secondarynamenode-NameNode.out

starting jobtracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-jobtracker-NameNode.out

DataNode_02: starting tasktracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-tasktracker-DataNode_02.out

DataNode_01: starting tasktracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-tasktracker-DataNode_01.out

分析:

  • 日志显示启动hadoop的5个守护进程:
  • datanode 、tasktracker、datanode、secondarynamenode、jobtracker,并且slaves文件中配置的slave节点成功启动

⑦:检测hadoop启动守护进程 NameNode节点运行jps:

15825 JobTracker
15622 NameNode
15752 SecondaryNameNode
15935 Jps

DataNode节点运行jps:

15237 DataNode
15350 Jps
15310 TaskTracker

结论:hadoop集群完全成功启动

注意:hadoop采用ssh实现文件传输,因此必然涉及到hadoop集群内部节点之间的文件访问权限问题,建议hadoop目录放在节点服务器登陆用户拥有完全权限的目录下,否则会出现日志写不进日志文件等问题

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!