centos7 hadoop+hive 安装

旧时模样 提交于 2020-02-01 10:55:15

准备四台虚拟机

虚拟机安装

1.创建新虚拟机
2.点击典型安装(推荐)
3.选择中文,点击自己分区
# 分区配置(JD使用)
/boot 200M
swap 512M  # 本机内存不够用了,用swap
/ # 根目录
4.配置其它,如下图

在这里插入图片描述

更新yum

yum install update -y

四台主机的ip

一主三从
172.20.10.9  密码:hadoop01  对应的虚拟机 hadoop01
172.20.10.10 密码:hadoop02  对应的虚拟机 hadoop02
172.20.10.11 密码:hadoop03  对应的虚拟机 hadoop03
172.20.10.12 密码:hadoop04  对应的虚拟机 hadoop04

# 重新设置root的密码
passwd root

hadoop安装

https://www.cnblogs.com/shireenlee4testing/p/10472018.html

配置DNS

每个节点都配置

vim /etc/hosts

172.20.10.9   hadoop01
172.20.10.10  hadoop02
172.20.10.11  hadoop03
172.20.10.12  hadoop04

关闭防火墙

# 关闭防火墙
systemctl stop firewalld

# 关闭自启动
systemctl disable firewalld

配置免密登录

https://www.cnblogs.com/shireenlee4testing/p/10366061.html

配置DNS

生成ssh密钥
# 生成ssh密钥
ssh-keygen -t rsa

cd /root/.ssh
ls 

# 在主节点(hadoop01)上将公钥拷到一个特定文件authorized_keys中
cp id_rsa.pub authorized_keys

# 把authorized_keys拷贝到hadoop02上
scp authorized_keys root@hadoop02:/root/.ssh/

# 登录hadoop02主机
cd .ssh/
cat id_rsa.pub >> authorized_keys
# 在把authorized_keys拷贝到hadoop03上
scp authorized_keys root@hadoop03:/root/.ssh/

# 登录hadoop03主机
cd .ssh/
cat id_rsa.pub >> authorized_keys
# 在把authorized_keys拷贝到hadoop04上
scp authorized_keys root@hadoop04:/root/.ssh/

# 登录hadoop04主机
cd .ssh/
cat id_rsa.pub >> authorized_keys
# 把生成好的authorized_keys拷贝到hadoop01,hadoop02,hadoop03
scp authorized_keys root@hadoop01:/root/.ssh/
scp authorized_keys root@hadoop02:/root/.ssh/
scp authorized_keys root@hadoop03:/root/.ssh/

# 验证免密登录
使用ssh 用户名@节点名或ssh ip地址命令验证免密码登录
ssh root@hadoop02

wget下载jdk8

https://blog.csdn.net/u014700139/article/details/89960494
# 把下载好的jdk拷贝到hadoop02,hadoop03,hadoop04
scp -r -P 22 jdk.tar.gz root@hadoop02:~/
scp -r -P 22 jdk.tar.gz root@hadoop03:~/
scp -r -P 22 jdk.tar.gz root@hadoop04:~/

配置JDK环境

tar -zxvf jdk.tat.gz
mv jdk1.8.0_241 /opt/
# 创建软连接
ln -s /opt/jdk1.8.0_241 /opt/jdk

# 配置java环境
vim /etc/profile
# Java
export JAVA_HOME=/opt/jdk
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin

# 使环境变量生效
source /etc/profile

# 验证java安装
java -version

搭建Hadoop完全分布式集群

hadoop版本下载

http://mirror.bit.edu.cn/apache/hadoop/common/
下载hadoop
wget http://us.mirrors.quenda.co/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
1.配置hadoop环境变量(每个节点)
# 解压到opt下
tar -zxvf hadoop-3.2.0.tar.gz -C /opt/

vim /etc/profile
# hadoop
export HADOOP_HOME=/opt/hadoop-3.2.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

# 保存后,使profile生效
source /etc/profile
2.配置Hadoop环境脚本文件中的JAVA_HOME参数
cd /opt/hadoop-3.2.0/etc/hadoop
#分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中添加或修改如下参数
vim hadoop-env.sh
vim mapred-env.sh
vim yarn-env.sh
export JAVA_HOME="/opt/jdk"
3.修改Hadoop配置文件

cd /opt/hadoop-3.2.0/etc/hadoop

Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers件,根据实际情况修改配置信息

创建文件夹

mkdir -p /opt/hadoop/tmp

core-site.xml (配置Common组件属性)
<configuration>
  <property>
      <!-- 配置hdfs地址 -->
      <name>fs.defaultFS</name>
      <value>hdfs://hadoop01:9000</value>
  </property>
  <property>
      <!-- 保存临时文件目录,需先在/opt/hadoop下创建tmp目录 -->
      <name>hadoop.tmp.dir</name>
     <value>/opt/hadoop/tmp</value>
 </property>
 </configuration>
hdfs-site.xml (配置HDFS组件属性)
<configuration>
      <property>
         <!-- 主节点地址 -->
          <name>dfs.namenode.http-address</name>
          <value>hadoop01:50070</value>
      </property>
      <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/opt/hadoop/dfs/name</value>
     </property>
     <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/opt/hadoop/dfs/data</value>
     </property>
     <property>
        <!-- 备份数为默认值3 -->
        <name>dfs.replication</name>
         <value>3</value>
     </property>
        <property> 
              <name>dfs.webhdfs.enabled</name> 
              <value>true</value> 
         </property>
      <property>
            <name>dfs.permissions</name>
            <value>false</value>
            <description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>
       </property>
 </configuration>
mapred-site.xml (配置Map-Reduce组件属性)
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <!--#设置MapReduce的运行平台为yarn-->
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
yarn-site.xml(配置资源调度属性)
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <!--#指定yarn的ResourceManager管理界面的地址,不配的话,Active Node始终为0-->
        <value>hadoop01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <!--#reducer获取数据的方式-->
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>hadoop01:8088</value>
        <description>配置外网只需要替换外网ip为真实ip,否则默认为 localhost:8088</description>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
        <description>每个节点可用内存,单位MB,默认8182MB</description>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
        <description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</description>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>     JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
        </value>
    </property>
</configuration>
workers
vim workers
# 添加一下内容
hadoop02
hadoop03
hadoop04
4.将配置好的文件夹拷贝到其他从节点
scp -r /opt/hadoop-3.2.0 root@hadoop02:/opt/
scp -r /opt/hadoop-3.2.0 root@hadoop03:/opt/

scp -r /opt/hadoop root@hadoop02:/opt/
scp -r /opt/hadoop root@hadoop03:/opt/
5.配置启动脚本,添加HDFS和Yarn权限
# 添加HDFS权限:编辑如下脚本,在第二行空白位置添加HDFS权限
cd /opt/hadoop-3.2.0/sbin
vim start-dfs.sh 
vim stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
# 添加Yarn权限:编辑如下脚本,在第二行空白位置添加Yarn权限
cd /opt/hadoop-3.2.0/sbin
vim start-yarn.sh 
vim stop-yarn.sh 

YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root
6.初始化 & 启动
cd /opt/hadoop-3.2.0
# init 
# 格式化
bin/hdfs namenode -format wmqhadoop

#启动
sbin/start-dfs.sh
sbin/start-yarn.sh

# 后面开启
sbin/start-all.sh

# 停止
sbin/stop-all.sh
7.验证Hadoop启动成功
jps
在浏览器输入:http://hadoop01:8088打开ResourceManager页面

在浏览器输入:http://hadoop01:50070打开Hadoop Namenode页面

mysql-5.7安装

下载

wget http://repo.mysql.com/yum/mysql-5.7-community/el/7/x86_64/mysql57-community-release-el7-10.noarch.rpm

rpm -ivh mysql57-community-release-el7-10.noarch.rpm

使用yum命令即可完成安装

1、安装命令:
yum -y install mysql-community-server

2、启动msyql:
systemctl start mysqld #启动MySQL

3、获取安装时的临时密码(在第一次登录时就是用这个密码):
grep 'temporary password' /var/log/mysqld.log
sGpt=V+8f,qv

Auftbt8Mht,x
3.设置开机启动
systemctl enable mysqld

登录

mysql -uroot -p
# 输入刚才的密码

修改密码

ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mysql123!';

设置允许远程登陆

1.执行授权命令

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'Mysql123!' WITH GRANT OPTION;

2.退出mysql操作控制台

exit

3.开放3306端口

开启防火墙

sudo systemctl start firewalld.service

永久开放3306端口

sudo firewall-cmd --add-port=3306/tcp --permanent

重新加载

sudo firewall-cmd --reload

关闭防火墙

sudo systemctl stop firewalld.service

设置默认编码为utf8

查看修改前mysql编码

show variables like '%chara%';

修改/etc/my.cnf文件,加入下面两行

vim /etc/my.cnf
character_set_server=utf8
init_connect='SET NAMES utf8'

修改后,重启mysql

sudo systemctl restart mysqld

hive安装

https://blog.csdn.net/qq_39315740/article/details/98626518 # 推荐

https://blog.csdn.net/weixin_43207025/article/details/101073351

hive下载

http://mirror.bit.edu.cn/apache/hive/

hive安装

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/

配置环境变量

vim /etc/profile
# hive
export HIVE_HOME=/opt/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin

source /etc/profile

创建hive-site.xml 文件

cd /opt/apache-hive-3.1.2-bin/conf
cp hive-default.xml.template hive-site.xml

为在hive-site.xml 中如下HDFS相关设置,因此我们需要现在HDFS中创建对应的目录

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

创建HDFS文件夹

hadoop fs -mkdir -p /user/hive/warehouse   # 创建文件夹
hadoop fs -mkdir -p /tmp/hive    # 创建文件夹
hadoop fs -chmod -R 777 /user/hive/warehouse   # 授予权限
hadoop fs -chmod -R 777 /tmp/hive   # 授予权限

# 查看是否创建成功
hadoop fs -ls /

Hive相关配置

hive-site.xml 中的{system:java.io.tmpdir}改为hive的本地临时目录,将{system:user.name}改为用户名
创建temp目录

cd /opt/apache-hive-3.1.2-bin
mkdir temp
chmod -R 777 temp
# 把${system:java.io.tmpdir}替换成/opt/apache-hive-3.1.2-bin/temp
# 把${system:user.name}替换成root
vim hive-site.xml
%s/${system:java.io.tmpdir}/\/opt\/apache-hive-3.1.2-bin\/temp/g
%s/${system:user.name}/root/g

数据库相关配置

# 数据库jdbc地址,value标签内修改为主机ip地址
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
  </property>
  
# 数据库的驱动类名称
# 新版本8.0版本的驱动为com.mysql.cj.jdbc.Driver
# 旧版本5.x版本的驱动为com.mysql.jdbc.Driver
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  
# 数据库用户名
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
 
# 数据库密码
   <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>Mysql123!</value> #修改为你自己的mysql密码
  </property>

  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>

配置hive-log4j2.properties

cd /opt/apache-hive-3.1.2-bin/conf
cp hive-log4j2.properties.template hive-log4j2.properties

vim hive-log4j2.properties
# 修改内容
property.hive.log.dir = /opt/apache-hive-3.1.2-bin/temp/root

配置hive-env.sh文件

cd /opt/apache-hive-3.1.2-bin/conf
cp hive-env.sh.template hive-env.sh
vim hive-env.sh

添加以下内容:

export JAVA_HOME=/opt/jdk
export HADOOP_HOME=/opt/hadoop-3.2.0
export HIVE_CONF_DIR=/opt/apache-hive-3.1.2-bin/conf
export HIVE_AUX_JARS_PATH=/opt/apache-hive-3.1.2-bin/lib

Hive启动

下载数据库5.7驱动
https://blog.csdn.net/qq_41950447/article/details/90085170
数据库驱动下载

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz

# 把数据库驱动 移到 hive 中的lib里
cp -r mysql-connector-java-5.1.48-bin.jar /opt/apache-hive-3.1.2-bin/lib
初始化
schematool -dbType mysql -initSchema
问题
# hive 初始化报错
http://www.lzhpo.com/article/98
# 比较hadoop和hive里的guava-27.0-jre.jar版本
cd /opt/hadoop-3.2.0/share/hadoop/common/lib
ll | grep guava*

cd /opt/apache-hive-3.1.2-bin/lib
ll | grep guava*

把高版本的guava-27.0-jre.jar替换低的guava-19.0-jre.jar版本

# 还有问题参考
https://blog.csdn.net/qq_39315740/article/details/98626518

Hadoop 3.1.2 + Hive 3.1.1 安装

https://www.cnblogs.com/weavepub/p/11130869.html

other

修改vim注释颜色

在用户~主文件夹下新建.vimrc配置文件
vim ~/.vimrc
# 添加该内容并保存
hi Comment ctermfg =blue

vim替换

%s/${system:java.io.tmpdir}/\/opt\/apache-hive-3.1.2-bin\/temp/g
%s/${system:user.name}/root/g
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!