hadoop相关环境搭建

余生颓废 提交于 2020-07-28 18:45:54

文章内容输出来源:拉勾教育Java高薪训练营

集群规划

计划在3台虚拟机安装hadoop、hbase、hive,192.168.3.24,192.168.3.7,192.168.3.8

hosts配置

/etc/hosts

192.168.3.24 centos1
192.168.3.7 centos2
192.168.3.8 centos3

环境变量

首先需要安装jdk,并配置环境变量。

export JAVA_HOME=/opt/soft/jdk1.8.0_45
export PATH=$PATH:$JAVA_HOME/bin:/opt/soft/apache-hive-2.3.6-bin/bin:/opt/soft/hadoop-2.9.2/bin

hadoop环境

安装hadoop

配置文件,在/opt/soft/hadoop-2.9.2/etc/hadoop目录下,三台机子要修改成一样

core-site.xml

<configuration>
<property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://centos1:9000</value>
    </property>
</configuration>

hdfs-site.xml

 <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/hdfs/data</value>
    </property>

hadoop-env.sh

在最后一行加入下面语句

export JAVA_HOME=/opt/soft/jdk1.8.0_45

mapred-site.xml

<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
   <property>
      <name>mapred.job.tracker</name>
      <value>http://centos1:9001</value>
  </property>
</configuration>

yarn-site.xml

<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>centos1</value>
    </property>
</configuration>

slaves

centos1
centos2
centos3

启动hadoop

初始化

在centos1 master上执行命令:

/opt/soft/hadoop-2.9.2/bin/hdfs namenode -format

启动

/opt/soft/hadoop-2.9.2/sbin/start-all.sh

jps

代表每台机的进程

centos1:namenode、ResourceManager、secondarynamenode
centos2:datanode、nodeManager
centos3:datanode、nodeManager

验证

hive环境

hive依赖mysql,请确保先安装mysql。

安装hive

配置hive-env.sh

JAVA_HOME=/opt/soft/jdk1.8.0_45
HADOOP_HOME=/opt/soft/hadoop-2.9.2
HIVE_HOME=/opt/soft/apache-hive-2.3.6-bin
export HIVE_CONF_DIR=${HIVE_HOME}/conf

配置hive-site.xml

注意,如果要开启可视化hive web环境,那么不要随便修改这个配置文件!!!

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://centos1:3306/hive?createDatabaseIfNotExist=true</value>
        <description>aJDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>6342180</value>
        <description>password to use against metastore database</description>
    </property>
    <property>
        <name>datanucleus.autoCreateSchema</name>
        <value>true</value>
    </property>
    <property>
        <name>datanucleus.autoCreateTables</name>
        <value>true</value>
    </property>
    <property>
        <name>datanucleus.autoCreateColumns</name>
        <value>true</value>
    </property>
    <!-- 设置 hive仓库的HDFS上的位置 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/usr/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>
    <property>
        <name>hive.hwi.listen.host</name>
        <value>centos1</value>
        <description>This is the host address the Hive Web Interface will listen on</description>
    </property>
    <property>
        <name>hive.hwi.listen.port</name>
        <value>9999</value>
        <description>This is the port the Hive Web Interface will listen on</description>
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>centos1</value>
    </property>
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>
    <property>
        <name>hive.server2.thrift.http.port</name>
        <value>10001</value>
    </property>
    <property>
        <name>hive.server2.thrift.http.path</name>
        <value>cliservice</value>
    </property>
    <!-- HiveServer2的WEB UI -->
    <property>
        <name>hive.server2.webui.host</name>
        <value>centos1</value>
    </property>
    <property>
        <name>hive.server2.webui.port</name>
        <value>10002</value>
    </property>
    <property>
        <name>hive.scratch.dir.permission</name>
        <value>755</value>
   </property>
</configuration>

初始化

/opt/soft/apache-hive-2.3.6-bin/bin/schematool -dbType mysql -initSchema

启动

/opt/soft/apache-hive-2.3.6-bin/bin/hive --service metastore
/opt/soft/apache-hive-2.3.6-bin/bin/hiveserver2
第一句可能不用,待实验

验证

docker安装zookeeper集群

docker-compose.yaml

注意三台机的ZOO_MY_ID是不一样的

version: '2'
networks:
  zk:
services:
  zookeeper1:
    image: zookeeper:3.4
    container_name: zk1.cloud
    network_mode: host
    ports:
        - "2181:2181"
        - "2888:2888"
        - "3888:3888"
    volumes:
        - /opt/soft/zookeeper/conf:/conf
        - /opt/soft/zookeeper/datalog:/datalog
    environment:
      ZOO_MY_ID: 1

zoo.cfg

这个文件在/opt/soft/zookeeper/conf目录下,三台机子是一模一样的配置。

一般docker就是配置一个文件映射,zk映射网络太麻烦了,上面docker-compose.yaml

配置了使用本机端口,方便做集群。

clientPort=2181
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=5
syncLimit=2
autopurge.snapRetainCount=3
autopurge.purgeInterval=0
maxClientCnxns=60
server.1=centos1:2888:3888
server.2=centos2:2888:3888
server.3=centos3:2888:3888

常见操作

cd /opt/soft/zookeeper

启动

docker-compose up -d

关闭

docker-compose down

验证

docker exec -it zk1.cloud zkServer.sh status

安装hbase

首先,需要拷贝配置文件

cp /opt/soft/hadoop-2.9.2/etc/hadoop/hdfs-site.xml /opt/soft/hbase/hbase-1.3.1/conf

cp /opt/soft/hadoop-2.9.2/etc/hadoop/core-site.xml /opt/soft/hbase/hbase-1.3.1/conf

修改hbase-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_231

export HBASE_MANAGES_ZK=FALSE

修改regionservers文件

centos1

centos2

centos3

修改hbase-site.xml

<configuration>
<!-- 指定hbase在HDFS上存储的路径 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://centos1:9000/hbase</value>
</property>
<!-- 指定hbase是分布式的 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定zk的地址,多个用“,”分割 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>centos1:2181,centos2:2181,centos3:2181</value>
</property>
</configuration>




分发hbase

scp -r /opt/soft/hbase cenots2:/opt/soft

scp -r /opt/soft/hbase cenots3:/opt/soft

启动hbase

sh /opt/soft/hbase/hbase-1.3.1/bin/start-hbase.sh

验证

结束语: 本篇文章是对拉勾高薪训练营学习过程中,Hadoop模块的一部分总结与复习。 (PS:这个训练营不仅学到了知识,还能认识了一些优秀的朋友,集美丽与智慧于一身的木槿老师,温柔而不失严厉与负责的班班)

顺便给自己打个相亲广告:卢学霸,性别男,爱好女。(以下省略1w字)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!