大数据教程（11.6）hadoop2.9.1平台上仓库工具hive2.3.4搭建

上一篇文章介绍了hive的原理以及实现机。本篇博客开始，博主将分享数据仓库hive工具搭建全过程。

一、安装Hive
(1)、下载Hive和环境准备：

Hive官网地址：http://hive.apache.org/index.html

Hive下载地址：http://www.apache.org/dyn/closer.cgi/hive/

注意： 在安装Hive之前，需要保证你的Hadoop集群已经正常启动，Hive只需在Hadoop集群的NameNode节点上安装即可，无需在DataNode节点上安装。

本文安装的是 apache-hive-2.3.4-bin.tar.gz 其下载地址为：http://mirrors.shu.edu.cn/apache/hive/hive-2.3.4/

(2)、执行安装

#上传
Alt+p;
cd ~
put apache-hive-2.3.4-bin.tar.gz
# 将下载好的hive压缩包解压到用户根目录
tar zxvf apache-hive-2.3.4-bin.tar.gz

(3)、配置hive

#a.配置环境变量，编辑/etc/profile
   #set hive env
   export HIVE_HOME=/home/hadoop/apps/apache-hive-2.3.4-bin
   export PATH=${HIVE_HOME}/bin:$PATH
   #让环境变量生效
   source /etc/profile

#创建hive-site.xml配置文件
   # 在开始配置Hive之前，先执行如下命令，切换到Hive的操作账户，我的是 hadoop 
   su - hadoop
   cd /home/hadoop/apps/apache-hive-2.3.4-bin/conf
   #以hive-default.xml.template为模板,创建 hive-site.xml
   cp hive-default.xml.template  hive-site.xml

(4)、在HDFS中创建Hive所需目录

因为在hive-site.xml中有以下配置：

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive</value>
  <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
</property>

所以需要在HDFS中创建好相应的目录，操作命令如下：

[hadoop@centos-aaron-h1 ~]$ hdfs dfs -mkdir -p /user/hive/warehouse
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -chmod -R 777 /user/hive/warehouse
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -mkdir -p /tmp/hive
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -chmod -R 777 /tmp/hive
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -ls /
Found 16 items
drwxr-xr-x   - hadoop supergroup          0 2018-12-30 07:45 /combinefile
drwxr-xr-x   - hadoop supergroup          0 2018-12-24 00:51 /crw
drwxr-xr-x   - hadoop supergroup          0 2018-12-24 00:51 /en
drwxr-xr-x   - hadoop supergroup          0 2018-12-19 07:11 /index
drwxr-xr-x   - hadoop supergroup          0 2018-12-09 06:57 /localwccombineroutput
drwxr-xr-x   - hadoop supergroup          0 2018-12-24 00:51 /loge
drwxr-xr-x   - hadoop supergroup          0 2018-12-23 08:12 /ordergp
drwxr-xr-x   - hadoop supergroup          0 2018-12-19 05:48 /rjoin
drwxr-xr-x   - hadoop supergroup          0 2018-12-23 05:12 /shared
drwx------   - hadoop supergroup          0 2019-01-20 23:34 /tmp
drwxr-xr-x   - hadoop supergroup          0 2019-01-20 23:33 /user
drwxr-xr-x   - hadoop supergroup          0 2018-12-05 08:32 /wccombineroutput
drwxr-xr-x   - hadoop supergroup          0 2018-12-05 08:39 /wccombineroutputs
drwxr-xr-x   - hadoop supergroup          0 2019-01-19 22:38 /webloginput
drwxr-xr-x   - hadoop supergroup          0 2019-01-20 02:13 /weblogout
drwxr-xr-x   - hadoop supergroup          0 2018-12-23 07:46 /weblogwash
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -ls /tmp/
Found 2 items
drwx------   - hadoop supergroup          0 2018-12-05 08:30 /tmp/hadoop-yarn
drwxrwxrwx   - hadoop supergroup          0 2019-01-20 23:34 /tmp/hive
[hadoop@centos-aaron-h1 ~]$  hdfs dfs -ls /user/hive
Found 1 items
drwxrwxrwx   - hadoop supergroup          0 2019-01-20 23:33 /user/hive/warehouse

(5)、配置hive-site.xml

a、配置hive本地临时目录

#将hive-site.xml文件中的${system:java.io.tmpdir}替换为hive的本地临时目录，例如我使用的是 #/home/hadoop/apps/apache-hive-2.3.4-bin/tmp ，如果该目录不存在，需要先进行创建，并且赋予读写权限：
[hadoop@centos-aaron-h1 apache-hive-2.3.4-bin]$ cd /home/hadoop/apps/apache-hive-2.3.4-bin
[hadoop@centos-aaron-h1 apache-hive-2.3.4-bin]$ mkdir tmp/
[hadoop@centos-aaron-h1 apache-hive-2.3.4-bin]$  chmod -R 777 tmp/
[hadoop@centos-aaron-h1 apache-hive-2.3.4-bin]$ cd conf

#在vim命令模式下执行如下命令完成替换
 %s#${system:java.io.tmpdir}#/home/hadoop/apps/apache-hive-2.3.4-bin/tmp#g
#如下：
#将
<property>
    <name>hive.exec.local.scratchdir</name>
    <value>${system:java.io.tmpdir}/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
</property>
#替换为
<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hadoop/apps/apache-hive-2.3.4-bin/tmp/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
</property>

#配置Hive用户名
#将hive-site.xml文件中的 ${system:user.name} 替换为操作Hive的账户的用户名，例如我的是 hadoop 。在vim命令模式##下执行如下命令完成替换：
 %s#${system:user.name}#hadoop#g
#如下：
将
<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hadoop/apps/apache-hive-2.3.4-bin/tmp/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
</property>
#替换为
<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hadoop/apps/apache-hive-2.3.4-bin/tmp/hadoop</value>
    <description>Local scratch space for Hive jobs</description>
</property>

b、修改Hive数据库配置

属性名称	描述
javax.jdo.option.ConnectionDriverName	数据库的驱动类名称
javax.jdo.option.ConnectionURL	数据库的JDBC连接地址
javax.jdo.option.ConnectionUserName	连接数据库所使用的用户名
javax.jdo.option.ConnectionPassword	连接数据库所使用的密码

Hive默认的配置使用的是Derby数据库来存储Hive的元数据信息，其配置信息如下：

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>APP</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>mine</value>
    <description>password to use against metastore database</description>
  </property>

需要将Derby数据库切换为MySQL数据库的话，只需要修改以上4项配置，例如：

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
</property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
</property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
</property>
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
</property>

在配置 javax.jdo.option.ConnectionURL 的时候，使用useSSL=false，禁用MySQL连接警告，而且可能会导致Hive初始化MySQL元数据失败。

此外，还需要将MySQL的驱动包拷贝到Hive的lib目录下

#上面配置文件中的驱动名称是 com.mysql.jdbc.Driver

cp ~/mysql-connector-java-5.1.28.jar  $HIVE_HOME/lib/

c、配置 hive-env.sh

[hadoop@centos-aaron-h1 conf]$ cd ~/apps/apache-hive-2.3.4-bin/conf
[hadoop@centos-aaron-h1 conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@centos-aaron-h1 conf]$ vi hive-env.sh

#新增以下内容
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.9.1
export HIVE_CONF_DIR=/home/hadoop/apps/apache-hive-2.3.4-bin/conf
export HIVE_AUX_JARS_PATH=/home/hadoop/apps/apache-hive-2.3.4-bin/lib

(6)、初始化启动和测试Hive

[hadoop@centos-aaron-h1 apache-hive-2.3.4-bin]$ cd bin
[hadoop@centos-aaron-h1 bin]$ schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:mysql://192.168.29.131:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
[hadoop@centos-aaron-h1 bin]$

数据库初始化完成之后，会在MySQL数据库里生成如下metadata表用于存储Hive的元数据信息：

(7)、启动hive

[hadoop@centos-aaron-h1 bin]$ ./hive
which: no hbase in (/home/hadoop/apps/apache-hive-2.3.4-bin/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/jdk1.7.0_45/bin:/home/hadoop/apps/hadoop-2.9.1/bin:/home/hadoop/apps/hadoop-2.9.1/sbin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/home/hadoop/apps/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive>

输入命令后发现hive 退格键不能用（网上搜索，将Secure CRT做以下操作就可以了）

终端》仿真
修改 终端(T) 为 Linux

查看hive数据库：

hive> show databases;
OK
default
Time taken: 3.88 seconds, Fetched: 1 row(s)
hive>

hive操作建库建表：

hive> create database wcc_log;
OK
Time taken: 0.234 seconds
hive> use wcc_log;
OK
Time taken: 0.019 seconds
hive> create table test_log(id int,name string);
OK
Time taken: 0.715 seconds
hive> show tables;
OK
test_log
Time taken: 0.026 seconds, Fetched: 1 row(s)
hive> select * from test_log;
OK
Time taken: 1.814 seconds
hive>

我们可以在hdfs中查看刚才创建的数据库、表等信息

mysql中查看hive的元数据信息（以下三种表分别是存储hive仓库中数据库、表、字段的三张元数据表）

hive操作情况表数据：

hive> use wcc_log;
OK
Time taken: 0.179 seconds
hive> truncate table test_log;
OK
Time taken: 0.399 seconds
hive> drop table test_log;
OK
Time taken: 1.195 seconds
hive> show tables;
OK
Time taken: 0.045 seconds
hive>

hive新建一张正规的表：

hive> create table t_web_log01(id int,name string)
    > row format delimited
    > fields terminated by ',';
OK
Time taken: 0.602 seconds

在linux中新建一个文件bbb_hive.txt，并且上传到hdfs:/user/hive/warehouse/wcc_log.db/t_web_log01目录

[hadoop@centos-aaron-h1 ~]$ cat bbb_hive.txt 
1,张三
2,李四
3,王二
4,麻子
5,隔壁老王
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -put bbb_hive.txt /user/hive/warehouse/wcc_log.db/t_web_log01

查看hive中该表的数据：

[hadoop@centos-aaron-h1 bin]$ ./hive
which: no hbase in (/home/hadoop/apps/apache-hive-2.3.4-bin/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/jdk1.7.0_45/bin:/home/hadoop/apps/hadoop-2.9.1/bin:/home/hadoop/apps/hadoop-2.9.1/sbin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apps/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apps/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/home/hadoop/apps/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive> show databases;
OK
default
wcc_log
Time taken: 3.859 seconds, Fetched: 2 row(s)
hive> use wcc_log
    > ;
OK
Time taken: 0.024 seconds
hive> show tables;
OK
t_web_log01
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive> select * from t_web_log01;
OK
1       张三
2       李四
3       王二
4       麻子
5       隔壁老王
Time taken: 1.353 seconds, Fetched: 5 row(s)
hive>

【暂时未解决问题】在做聚合查询时错误日志记录：select count(*) from t_web_log01;

hive> select count(id) from t_web_log01;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = hadoop_20190121041134_962cf495-4474-4c91-98ea-8a96bc548b20
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.getClient(RpcClientFactoryPBImpl.java:81)
        at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getProxy(HadoopYarnProtoRPC.java:48)
        at org.apache.hadoop.mapred.ClientCache$1.run(ClientCache.java:95)
        at org.apache.hadoop.mapred.ClientCache$1.run(ClientCache.java:92)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.mapred.ClientCache.instantiateHistoryProxy(ClientCache.java:92)
        at org.apache.hadoop.mapred.ClientCache.getInitializedHSProxy(ClientCache.java:77)
        at org.apache.hadoop.mapred.YARNRunner.addHistoryToken(YARNRunner.java:219)
        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:253)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:411)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.getClient(RpcClientFactoryPBImpl.java:78)
        ... 45 more
Caused by: java.lang.OutOfMemoryError: PermGen space
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
        at java.lang.Class.privateGetPublicMethods(Class.java:2651)
        at java.lang.Class.privateGetPublicMethods(Class.java:2661)
        at java.lang.Class.privateGetPublicMethods(Class.java:2661)
        at java.lang.Class.privateGetPublicMethods(Class.java:2661)
        at java.lang.Class.getMethods(Class.java:1467)
        at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:426)
        at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:323)
        at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:636)
        at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:722)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:101)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:583)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:549)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:496)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:461)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:647)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:604)
        at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.HSClientProtocolPBClientImpl.<init>(HSClientProtocolPBClientImpl.java:38)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Job Submission failed with exception 'org.apache.hadoop.yarn.exceptions.YarnRuntimeException(java.lang.reflect.InvocationTargetException)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.lang.reflect.InvocationTargetException
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at org.apache.hadoop.hive.common.FileUtils.deleteDirectory(FileUtils.java:778)
        at org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1560)
        at org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:66)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:762)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Exception in thread "Thread-1" java.lang.OutOfMemoryError: PermGen space




hive> set mapred.reduce.tasks = 1; 
hive> select count(1) from t_web_log01;
OK
0
Time taken: 1.962 seconds, Fetched: 1 row(s)
hive> select count(id) from t_web_log01;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = hadoop_20190121041942_3f55a0ac-c478-43f5-abf3-0a7aace5c334
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at org.apache.hadoop.hive.common.FileUtils.deleteDirectory(FileUtils.java:778)
        at org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1560)
        at org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:66)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:762)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

原因：hive2.3.4已经建议更换其它的平台（Apache Tez, Apache Spark）来跑任务了。博主下一篇文章将用hive1.2.2来为小伙伴们重新搭建一次！

最后寄语，以上是博主本次文章的全部内容，如果大家觉得博主的文章还不错，请点赞；如果您对博主其它服务器大数据技术或者博主本人感兴趣，请关注博主博客，并且欢迎随时跟博主沟通交流。

来源：oschina

链接：https://my.oschina.net/u/2371923/blog/3003831

标签

Apache JDO