JDK安装
设置hostname
[root@bigdata111 ~]# vi /etc/hostname
设置机器hosts
[root@bigdata111 ~]# vi /etc/hosts
192.168.1.111 bigdata111 192.168.1.112 bigdata112 192.168.1.113 bigdata113
创建jdk目录
[root@bigdata111 /]# cd /opt [root@bigdata111 opt]# ll 总用量 0 drwxr-xr-x. 2 root root 6 3月 26 2015 rh [root@bigdata111 opt]# mkdir module [root@bigdata111 opt]# mkdir soft [root@bigdata111 opt]# ls module rh soft
上传jdk包
打开winSCP工具,通过winscp工具上传java jdk到linux 的/opt/soft文件夹下
[root@bigdata111 opt]# cd soft [root@bigdata111 soft]# ls jdk-8u144-linux-x64.tar.gz
解压jdk
将jdk文件解压到module文件夹下,命令如下:
[root@bigdata111 opt]# cd soft [root@bigdata111 opt]# tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/ [root@bigdata111 soft]# cd /opt/module [root@bigdata111 module]# ls jdk1.8.0_144
设置jdk的环境变量
[root@bigdata111 module]# vi /etc/profile
在文件末尾添加jdk的环境变量,保存退出:
export JAVA_HOME=/opt/module/jdk1.8.0_144 export PATH=$PATH:$JAVA_HOME/bin
刷新环境变量
[root@bigdata111 module]# source /etc/profile
查看jdk安装是否成功
[root@bigdata111 module]# java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
搭建Hadoop本地模式
本地模式就是单机装hadoop。
安装hadoop
上传hadoop包
通过winSCP上传hadoop包到/opt/soft/文件夹下
[root@bigdata111 soft]# ls hadoop-2.8.4.tar.gz jdk-8u144-linux-x64.tar.gz
解压hadoop
解压hadoop到/opt/module/下
[root@bigdata111 module]# tar -zvxf hadoop-2.8.4.tar.gz -C /opt/module/ [root@bigdata111 soft]# cd /opt/module/ [root@bigdata111 module]# ls hadoop-2.8.4 jdk1.8.0_144
设置hadoop环境变量
[root@bigdata111 module]# vi /etc/profile
末尾添加如下配置,保存退出:
export HADOOP_HOME=/opt/module/hadoop-2.8.4 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
刷新配置文件
[root@bigdata111 module]# source /etc/profile
查看hadoop是否安装成功
[root@bigdata111 module]# hadoop Usage: hadoop [--config confdir] [COMMAND | CLASSNAME] CLASSNAME run the class named CLASSNAME or where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file note: please use "yarn jar" to launch YARN applications, not this command. checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the Hadoop jar and the required libraries credential interact with credential providers daemonlog get/set the log level for each daemon trace view and modify Hadoop tracing settings Most commands print help when invoked w/o parameters.
测试hadoop实例
创建测试文件
在module目录下新建testdoc文件,输入文本:
[root@bigdata111 module]# cd /opt/module [root@bigdata111 module]# touch testdoc [root@bigdata111 module]# vi testdoc [root@bigdata111 module]# cat testdoc this is a test page! chinese is the best country this is a ceshi page! i love china listen to the music and son on
切换jar包目录
切换到hadoop的jar包执行目录:
[root@bigdata111 module]# cd /opt/module/hadoop-2.8.4/share/hadoop/mapreduce/ [root@bigdata111 mapreduce]# ls hadoop-mapreduce-client-app-2.8.4.jar hadoop-mapreduce-client-core-2.8.4.jar hadoop-mapreduce-client-hs-plugins-2.8.4.jar hadoop-mapreduce-client-jobclient-2.8.4-tests.jar hadoop-mapreduce-examples-2.8.4.jar lib sources hadoop-mapreduce-client-common-2.8.4.jar hadoop-mapreduce-client-hs-2.8.4.jar hadoop-mapreduce-client-jobclient-2.8.4.jar hadoop-mapreduce-client-shuffle-2.8.4.jar jdiff lib-examples
执行wordcount程序
[root@bigdata111 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /opt/module/testdoc /opt/module/out [root@bigdata111 mapreduce]# ls /opt/module/out part-r-00000 _SUCCESS [root@bigdata111 mapreduce]# cat /opt/module/out/part-r-00000 a 2 and 1 best 1 ceshi 1 china 1 chinese 1 country 1 i 1 is 3 listen 1 love 1 music 1 on 1 page! 2 son 1 test 1 the 2 this 2 to 1
搭建Hadoop伪分布式
伪分布式就是在单台机器上配置分布式操作。
查看hadoop可执行文件
[root@bigdata111 mapreduce]# cd /opt/module/hadoop-2.8.4/ [root@bigdata111 hadoop-2.8.4]# ls bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share [root@bigdata111 hadoop-2.8.4]# cd bin [root@bigdata111 bin]# ls container-executor hadoop hadoop.cmd hdfs hdfs.cmd mapred mapred.cmd rcc test-container-executor yarn yarn.cmd [root@bigdata111 bin]# cd .. [root@bigdata111 hadoop-2.8.4]# cd sbin [root@bigdata111 sbin]# ls distribute-exclude.sh hadoop-daemons.sh hdfs-config.sh kms.sh refresh-namenodes.sh start-all.cmd start-balancer.sh start-dfs.sh start-yarn.cmd stop-all.cmd stop-balancer.sh stop-dfs.sh stop-yarn.cmd yarn-daemon.sh hadoop-daemon.sh hdfs-config.cmd httpfs.sh mr-jobhistory-daemon.sh slaves.sh start-all.sh start-dfs.cmd start-secure-dns.sh start-yarn.sh stop-all.sh stop-dfs.cmd stop-secure-dns.sh stop-yarn.sh yarn-daemons.sh
切换配置文件目录
进入到hadoop设置/opt/module/hadoop-2.8.4/etc/hadoop/目录:
[root@bigdata111 hadoop]# cd /opt/module/hadoop-2.8.4/etc/hadoop/ [root@bigdata111 hadoop]# ls capacity-scheduler.xml core-site.xml hadoop-metrics2.properties hdfs-site.xml httpfs-signature.secret kms-env.sh log4j.properties mapred-queues.xml.template ssl-client.xml.example yarn-env.sh configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs-site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml.template ssl-server.xml.example yarn-site.xml container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh slaves yarn-env.cmd
配置core-site.xml
[root@bigdata111 hadoop]# vi core-site.xml
<configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata111:9000</value> </property> <!-- 指定hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.8.4/data/tmp</value> </property> </configuration>
配置hdfs-site.xml
[root@bigdata111 hadoop]# vi hdfs-site.xml
<configuration> <!--数据冗余数--> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
配置yarn-site.xml
[root@bigdata111 hadoop]# vi yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <!-- reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>bigdata111</value> </property> <!-- 日志聚集功能使能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置7天(秒) --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
配置mapred-site.xml
重命名mapred-site.xml.template为mapred-site.xml,配置内容
[root@bigdata111 hadoop]# mv mapred-site.xml.template mapred-site.xml [root@bigdata111 hadoop]# ls capacity-scheduler.xml core-site.xml hadoop-metrics2.properties hdfs-site.xml httpfs-signature.secret kms-env.sh log4j.properties mapred-queues.xml.template ssl-client.xml.example yarn-env.sh configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs-site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml ssl-server.xml.example yarn-site.xml container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh slaves yarn-env.cmd [root@bigdata111 hadoop]# vi mapred-site.xml
<configuration> <!-- 指定mr运行在yarn上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--历史服务器的地址--> <property> <name>mapreduce.jobhistory.address</name> <value>bigdata111:10020</value> </property> <!--历史服务器页面的地址--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>bigdata111:19888</value> </property> </configuration>
配置hadoop-env.sh
修改java_home为绝对路径,保存退出:
[root@bigdata111 hadoop]# vi hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
格式化namenode
配置完毕,格式化namenode(仅第一次格式化文件)
[root@bigdata111 hadoop]# hadoop namenode -format
为什么要格式化?
NameNode主要被用来管理整个分布式文件系统的命名空间(实际上就是目录和文件)的元数据信息,同时为了保证数据的可靠性,还加入了操作日志,所以,NameNode会持久化这些数据(保存到本地的文件系统中)。对于第一次使用HDFS,在启动NameNode时,需要先执行-format命令,然后才能正常启动NameNode节点的服务。
格式化做了哪些事情?
在NameNode节点上,有两个最重要的路径,分别被用来存储元数据信息和操作日志,而这两个路径来自于配置文件,它们对应的属性分别是dfs.name.dir和dfs.name.edits.dir,同时,它们默认的路径均是/tmp/hadoop/dfs/name。格式化时,NameNode会清空两个目录下的所有文件,之后,会在目录dfs.name.dir下创建文件
hadoop.tmp.dir 这个配置,会让dfs.name.dir和dfs.name.edits.dir会让两个目录的文件生成在一个目录里
开启hdfs和yarn服务
当namenode和resourcemanager在一台机器时,使用如下命令:
[root@bigdata111 hadoop]# start-all.sh
当二者不为一台机器时,使用如下命令:
[root@bigdata111 hadoop]# start-dfs.sh [root@bigdata111 hadoop]# start-yarn.sh
访问hdfs 的web页面
默认端口:50070
http://192.168.1.111:50070
访问yarn的web页面
默认端口:8088
http://192.168.1.111:8088
搭建Hadoop集群
采用VMvare克隆模式,以111机器为模板,克隆另外两台机器。
修改主机名和IP
修改克隆的两台机器的hostname和IP地址,方便xshell连接:
[root@bigdata112 ~]# vi /etc/hostname [root@bigdata112 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736 [root@bigdata112 ~]# service network restart [root@bigdata112 ~]# ip addr
TYPE=Ethernet BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=eno16777736 UUID=24bbe130-f59a-4b25-9df6-cf5857c89699 DEVICE=eno16777736 ONBOOT=yes IPADDR=192.168.1.112 GATEWAY=192.168.1.2 DNS1=8.8.8.8
删除data目录
删除/opt/module/hadoop-2.8.4的data目录,目的配置分布式集群。
[root@bigdata111 hadoop-2.8.4]# cd /opt/module/hadoop-2.8.4/ [root@bigdata111 hadoop-2.8.4]# rm -rf data/
配置hosts
配置hosts的IP和主机名对应关系
[root@bigdata111 hadoop-2.8.4]# vi /etc/hosts
192.168.1.111 bigdata111 192.168.1.112 bigdata112 192.168.1.113 bigdata113
SCP发送其他机器
将第一台配置好的hosts文件发送到其他两台机器:
[root@bigdata111 hadoop-2.8.4]# scp /etc/hosts root@bigdata112:/etc/ [root@bigdata111 hadoop-2.8.4]# scp /etc/hosts root@bigdata113:/etc/
配置SSH免密登录
- 利用Xshell的发送键输入到所有会话功能,在三台机器生成秘钥
[root@bigdata111 hadoop-2.8.4]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: cc:47:37:5a:93:0f:77:38:53:af:a3:57:47:55:27:59 root@bigdata111 The key's randomart image is: +--[ RSA 2048]----+ | .oE| | ..++| | . B = +| | o . + * * | | S o + o| | . . o.| | . . | | . | | | +-----------------+
- 利用Xshell的发送键输入到所有会话功能,将秘钥添加到集群中各个机器的秘钥库中
[root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata111 [root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata112 [root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata113
- 查看秘钥库是否存在
[root@bigdata111 .ssh]# cd /root/.ssh [root@bigdata111 .ssh]# ls authorized_keys id_rsa id_rsa.pub known_hosts [root@bigdata111 .ssh]# cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7cSXZDdNJ0Cg+1wyVoCn4pWEAxy/13/ekg//YVkGwEsR6HO4XaYxxstVBij5JoTEEjSDNmz2HifTZDB098py3x882ZLVHJllJWzXYX4gVof/tmdmk5AJbhIlX3SoauTrrrzFiMtuXKdu6slvzhs9IbDp68xCUNiVI06OnWFSuhQc8Td+tekwlFPfm+v3W/PqUUgQAd+OAqOUC2vEjjnACQNw/wgGvF/lqrXDv5ZIFmYCBlB7YxwP9RykOvAzEe7w2W7TOt0K8V8oKKTui4aZuahWDbsGwlD7TAQRkilXkG59XG48AWOQoU/XFxph+XECqJzjmdxYedzY8inYW/Lfx root@bigdata111 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYyMVfLaL9w9sGz5hQG96ksUN5ih2RHdwsiXBpL/ZRG7LasKS+OQcszmc61TJfV0Vjad7kuL9wlg2YqlVIQJvaIUQCw4+5BrO0vCy4JBrz/FiDjzxKx0Ba+ILziuMxl35RxDCVGph17i2jpYfy6jGLejYK9kpJH4ueIj8mm+4LTKabRZTcjdNNI0kYM+Tr08wEIuQ45adqVU9MpZc/j6i1FIr4R/RabyuO1FhEh0+Oc5Xbm3jSAYH0MgEvK1cuG9wmX7SaB/opO00Ts+nW/P4umeZQUy51IQSRdUF6BlMrshnCSlKHnuLv2eSCx9yv3QuQMWHnL/SOXUgTnIuzbrv9 root@bigdata112 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDoBOAT/n1QCnaVJtRS1Q9GeoP665gIayWxpSWbjEFus4DL4as5S9jAIhBQWrTnvZzm+Skb4dxGPgdPYLaMFX9tdDYPPsnnRR92sLpRw9gwvG5ROL5XPpV2X+Yxl6yACmlMT0JP1uk+Ekm623n6wtBSBP1BDtJ/fhXkRX6bo2kuXs4BvmP76cikdGBDygKNIEMPTcs6p2lfOnuVdQLSCGm+Q9NswKSBVElNyywNl5J9L/5kIzGXnoGtwhQtdrOjZ+c1tyiwhCz42I3c4z0Sb/zH3OFtHCvRG7cF72uDFxe1QwVJ4h1hJ1dmtwVCckNMbmmgK72PsN8Zg4Y8XtBXgX8n root@bigdata113
- 验证SSH免密码登录是否配置成功
[root@bigdata111 .ssh]# ssh bigdata112 Last login: Mon Aug 5 09:23:11 2019 from bigdata112 [root@bigdata112 ~]# ssh bigdata111 Last login: Mon Aug 5 09:09:23 2019 from 192.168.1.1
部署jdk和hadoop
- 去除勾选“发送键输入到所有会话”,从bigdata111发送module文件夹到另外两台机器/opt/文件夹下:
[root@bigdata111 module]# scp -r /opt/module/ root@bigdata112:/opt/ [root@bigdata111 module]# scp -r /opt/module/ root@bigdata113:/opt/
- 将环境变量/etc/profile发送到另外两台机器:
[root@bigdata111 module]# scp -r /etc/profile root@bigdata112:/etc/ [root@bigdata111 module]# scp -r /etc/profile root@bigdata113:/etc/
- 切换到另外两台机器,刷新环境变量:
[root@bigdata112 module]# source /etc/profile [root@bigdata112 module]# jps 2775 Jps [root@bigdata113 module]# source /etc/profile [root@bigdata113 module]# jps 2820 Jps
配置集群xml
勾选“发送键输入到所有会话”,配置hdfs-site,yarn-site,mapred-site的xml文件:
- hdfs-site.xml配置如下(SecondaryNameNode配置在113上):
<configuration> <!--数据冗余数--> <property> <name>dfs.replication</name> <value>3</value> </property> <!--secondary的地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>bigdata113:50090</value> </property> <!--关闭权限--> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
- yarn-site.xml配置如下(yarn配置在112上):
<configuration> <!-- Site specific YARN configuration properties --> <!-- reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>bigdata112</value> </property> <!-- 日志聚集功能使能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置7天(秒) --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
- mapred-site.xml配置如下:
<configuration> <!-- 指定mr运行在yarn上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--历史服务器的地址--> <property> <name>mapreduce.jobhistory.address</name> <value>bigdata112:10020</value> </property> <!--历史服务器页面的地址--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>bigdata112:19888</value> </property> </configuration>
配置slaves的datanode
[root@bigdata111 ~]# cd /opt/module/hadoop-2.8.4/etc/hadoop/ [root@bigdata111 hadoop]# ls capacity-scheduler.xml core-site.xml hadoop-metrics2.properties hdfs-site.xml httpfs-signature.secret kms-env.sh log4j.properties mapred-queues.xml.template ssl-client.xml.example yarn-env.sh configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs-site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml ssl-server.xml.example yarn-site.xml container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh slaves yarn-env.cmd [root@bigdata111 hadoop]# vi slaves
bigdata111 bigdata112 bigdata113
格式化namenode
利用xshell的“发送键输入到所有会话”功能,格式化namenode
[root@bigdata111 hadoop]# hadoop namenode -format [root@bigdata112 hadoop]# hadoop namenode -format [root@bigdata113 hadoop]# hadoop namenode -format
启动111的hdfs
[root@bigdata111 hadoop]# start-dfs.sh
启动112的yarn
[root@bigdata112 hadoop]# start-yarn.sh
输出三台机器的jps进程
[root@bigdata111 hadoop]# jps 2512 DataNode 2758 NodeManager 2377 NameNode 2894 Jps
[root@bigdata112 ~]# jps 2528 NodeManager 2850 Jps 2294 DataNode 2413 ResourceManager
[root@bigdata113 ~]# jps 2465 NodeManager 2598 Jps 2296 DataNode 2398 SecondaryNameNode