环境描述
根据需求,部署hadoop-3.0.0基础功能架构,以三节点为安装环境,操作系统CentOS 7 x64;
openstack创建三台虚拟机,开始部署;
IP地址 主机名
10.10.204.31 master
10.10.204.32 node1
10.10.204.33 node2
功能节点规划
master node1 node2
NameNode
DataNode DataNode DataNode
HQuorumPeer NodeManager NodeManager
ResourceManager SecondaryNameNode
HMaster
三节点执行初始化操作;
1.更新系统环境;
yum clean all && yum makecache fast && yum update -y && yum install -y wget vim net-tools git ftp zip unzip
2.根据规划修改主机名;
hostnamectl set-hostname master
hostnamectl set-hostname node1
hostnamectl set-hostname node2
3.添加hosts解析;
vim /etc/hosts
10.10.204.31 master
10.10.204.32 node1
10.10.204.33 node2
4.ping测试三台主机之间主机名互相解析正常;
ping master
ping node1
ping node2
5.下载安装JDK环境;
#hadoop 3.0版本需要JDK 8.0支持;
cd /opt/
#通常情况下,需要登录oracle官网,注册账户,同意其协议后,才能下载,在此根据链接直接wget方式下载;
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "https://download.oracle.com/otn-pub/java/jdk/8u202-b08/1961070e4c9b4e26a04e7f5a083f551e/jdk-8u202-linux-x64.tar.gz"
#创建JDK和hadoop安装路径
mkdir /opt/modules
cp /opt/jdk-8u202-linux-x64.tar.gz /opt/modules
cd /opt/modules
tar zxvf jdk-8u202-linux-x64.tar.gz
#配置环境变量
export JAVA_HOME="/opt/modules/jdk1.8.0_202"
export PATH=$JAVA_HOME/bin/:$PATH
source /etc/profile
#永久生效配置方式
vim /etc/bashrc
#add lines
export JAVA_HOME="/opt/modules/jdk1.8.0_202"
export PATH=$JAVA_HOME/bin/:$PATH
6.下载解压hadoop-3.0.0安装包;
cd /opt/
wget http://archive.apache.org/dist/hadoop/core/hadoop-3.0.0/hadoop-3.0.0.tar.gz
cp /opt/hadoop-3.0.0.tar.gz /modules/
cd /opt/modules
tar zxvf hadoop-3.0.0.tar.gz
7.关闭selinux/firewalld防火墙;
systemctl disable firewalld
vim /etc/sysconfig/selinux
SELINUX=disabled
8.重启服务器;
reboot
master节点操作;
说明:
测试环境,全部使用root账户进行安装运行hadoop;
1.添加ssh 免密码登陆;
cd
ssh-keygen
##三次回车即可
#拷贝密钥文件到node1/node2
ssh-copy-id master
ssh-copy-id node1
ssh-copy-id node2
2.测试免密码登陆正常;
ssh master
ssh node1
ssh node2
3.修改hadoop配置文件;
对于hadoop配置,需修改配置文件:
hadoop-env.sh
yarn-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
workers
cd /opt/modules/hadoop-3.0.0/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/opt/modules/jdk1.8.0_202
vim yarn-env.sh
export JAVA_HOME=/opt/modules/jdk1.8.0_202
配置文件解析:
https://blog.csdn.net/m290345792/article/details/79141336
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value></value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value></value>
</property>
</configuration>
#io.file.buffer.size 队列文件中的读/写缓冲区大小
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave2:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>副本个数,配置默认是3,应小于datanode机器数量</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
</configuration>
###namenode配置
#dfs.namenode.name.dir NameNode持久存储名称空间和事务日志的本地文件系统上路径,如果这是一个逗号分隔的目录列表,那么将在所有目录中复制名称的表,以进行冗余。
#dfs.hosts / dfs.hosts.exclude 包含/摒弃的数据存储节点清单,如果有必要,使用这些文件来控制允许的数据存储节点列表
#dfs.blocksize HDFS 块大小为128MB(默认)的大文件系统
#dfs.namenode.handler.count 多个NameNode服务器线程处理来自大量数据节点的rpc
###datanode配置
#dfs.datanode.data.dir DataNode的本地文件系统上存储块的逗号分隔的路径列表,如果这是一个逗号分隔的目录列表,那么数据将存储在所有命名的目录中,通常在不同的设备上。
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/modules/hadoop-3.0.0/etc/hadoop,
/opt/modules/hadoop-3.0.0/share/hadoop/common/,
/opt/modules/hadoop-3.0.0/share/hadoop/common/lib/,
/opt/modules/hadoop-3.0.0/share/hadoop/hdfs/,
/opt/modules/hadoop-3.0.0/share/hadoop/hdfs/lib/,
/opt/modules/hadoop-3.0.0/share/hadoop/mapreduce/,
/opt/modules/hadoop-3.0.0/share/hadoop/mapreduce/lib/,
/opt/modules/hadoop-3.0.0/share/hadoop/yarn/,
/opt/modules/hadoop-3.0.0/share/hadoop/yarn/lib/
</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
</configuration>
###resourcemanager和nodemanager配置
#yarn.acl.enable 允许ACLs,默认是false
#yarn.admin.acl 在集群上设置adminis。 ACLs are of for comma-separated-usersspacecomma-separated-groups.默认是指定值为表示任何人。特别的是空格表示皆无权限。
#yarn.log-aggregation-enable Configuration to enable or disable log aggregation 配置是否允许日志聚合。
###resourcemanager配置
#yarn.resourcemanager.address 值:ResourceManager host:port 用于客户端任务提交.说明:如果设置host:port ,将覆盖yarn.resourcemanager.hostname.host:port主机名。
#yarn.resourcemanager.scheduler.address 值:ResourceManager host:port 用于应用管理者向调度程序获取资源。说明:如果设置host:port ,将覆盖yarn.resourcemanager.hostname主机名
#yarn.resourcemanager.resource-tracker.address 值:ResourceManager host:port 用于NodeManagers.说明:如果设置host:port ,将覆盖yarn.resourcemanager.hostname的主机名设置。
#yarn.resourcemanager.admin.address 值:ResourceManager host:port 用于管理命令。说明:如果设置host:port ,将覆盖yarn.resourcemanager.hostname主机名的设置
#yarn.resourcemanager.webapp.address 值:ResourceManager web-ui host:port.说明:如果设置host:port ,将覆盖yarn.resourcemanager.hostname主机名的设置
#yarn.resourcemanager.hostname 值:ResourceManager host. 说明:可设置为代替所有yarn.resourcemanager address 资源的主机单一主机名。其结果默认端口为ResourceManager组件。
#yarn.resourcemanager.scheduler.class 值:ResourceManager 调度类. 说明:Capacity调度 (推荐), Fair调度 (也推荐),或Fifo调度.使用完全限定类名,如 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.
#yarn.scheduler.minimum-allocation-mb 值:在 Resource Manager上为每个请求的容器分配的最小内存.
#yarn.scheduler.maximum-allocation-mb 值:在Resource Manager上为每个请求的容器分配的最大内存
#yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path 值:允许/摒弃的nodeManagers列表 说明:如果必要,可使用这些文件来控制允许的NodeManagers列表
vim workers
master
slave1
slave2
4.修改启动文件
#因为测试环境以root账户启动hadoop服务,所以需对启动文件添加权限;
cd /opt/modules/hadoop-3.0.0/sbin
vim start-dfs.sh
#add lines
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_ZKFC_USER=root
HDFS_JOURNALNODE_USER=root
vim stop-dfs.sh
#add lines
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_ZKFC_USER=root
HDFS_JOURNALNODE_USER=root
vim start-yarn.sh
#add lines
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
vim stop-yarn.sh
#add lines
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
5.推送hadoop配置文件;
cd /opt/modules/hadoop-3.0.0/etc/hadoop
scp ./ root@node1:/opt/modules/hadoop-3.0.0/etc/hadoop/
scp ./ root@node2:/opt/modules/hadoop-3.0.0/etc/hadoop/
6.格式化hdfs;
#配置文件中指定hdfs存储路径为/data/tmp/
/opt/modules/hadoop-3.0.0/bin/hdfs namenode -format
7.启动hadoop服务;
#namenode 三节点
cd /opt/modules/zookeeper-3.4.13
./bin/zkServer.sh start
cd /opt/modules/kafka_2.12-2.1.1
./bin/kafka-server-start.sh ./config/server.properties &
/opt/modules/hadoop-3.0.0/bin/hdfs journalnode &
#master节点
/opt/modules/hadoop-3.0.0/bin/hdfs namenode -format
/opt/modules/hadoop-3.0.0/bin/hdfs zkfc -formatZK
/opt/modules/hadoop-3.0.0/bin/hdfs namenode &
#slave1节点
/opt/modules/hadoop-3.0.0/bin/hdfs namenode -bootstrapStandby
/opt/modules/hadoop-3.0.0/bin/hdfs namenode &
/opt/modules/hadoop-3.0.0/bin/yarn resourcemanager &
/opt/modules/hadoop-3.0.0/bin/yarn nodemanager &
#slave2节点
/opt/modules/hadoop-3.0.0/bin/hdfs namenode -bootstrapStandby
/opt/modules/hadoop-3.0.0/bin/hdfs namenode &
/opt/modules/hadoop-3.0.0/bin/yarn resourcemanager &
/opt/modules/hadoop-3.0.0/bin/yarn nodemanager &
#namenode 三节点
/opt/modules/hadoop-3.0.0/bin/hdfs zkfc &
#master节点
cd /opt/modules/hadoop-3.0.0/
./sbin/start-all.sh
cd /opt/modules/hadoop-3.0.0/hbase-2.0.4
./bin/start-hbase.sh
8.查看各个节点hadoop服务正常启动;
jps
9.运行测试;
cd /opt/modules/hadoop-3.0.0
#hdfs上创建测试路径
./bin/hdfs dfs -mkdir /testdir1
#创建测试文件
cd /opt
touch wc.input
vim wc.input
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
#将wc.input上传到HDFS
bin/hdfs dfs -put /opt/wc.input /testdir1/wc.input
#运行hadoop自带的mapreduce Demo
./bin/yarn jar /opt/modules/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar wordcount /testdir1/wc.input /output
#查看输出文件
bin/hdfs dfs -ls /output
10.状态截图
所有服务正常启动后截图:
zookeeper+kafka+namenode+journalnode+hbase
路过点一赞,技术升一线,加油↖(^ω^)↗!
来源:51CTO
作者:冰河cloud
链接:https://blog.51cto.com/driver2ice/2432011