hive setting hive.optimize.sort.dynamic.partition

问题

I am trying to insert into a hive table with dynamic partitions. The same query has been running fine for last few days, but is giving the below error now.

Diagnostic Messages for this Task: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error:
Unable to deserialize reduce input key from
x1x128x0x0x46x234x240x192x148x1x68x69x86x50x0x1x128x0x104x118x1x128x0x0x46x234x240x192x148x1x128x0x0x25x1x128x0x0x46x1x128x0x0x72x1x127x255x255x255x0x0x0x0x1x71x66x80x0x255
with properties
{columns=reducesinkkey0,reducesinkkey1,reducesinkkey2,reducesinkkey3,reducesinkkey4,reducesinkkey5,reducesinkkey6,reducesinkkey7,reducesinkkey8,reducesinkkey9,reducesinkkey10,reducesinkkey11,reducesinkkey12,
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
serialization.sort.order=+++++++++++++,
columns.types=bigint,string,int,bigint,int,int,int,string,int,string,string,string,string}
    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
    at org.apache.hadoop.mapred.Child$4.run(Child.java

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.33 sec   HDFS
Read: 889 HDFS Write: 314 SUCCESS Stage-Stage-2: Map: 1  Reduce: 1  
Cumulative CPU: 1.42 sec   HDFS Read: 675 HDFS Write: 0 FAIL

When I use the below setting, the query runs fine

set hive.optimize.sort.dynamic.partition=false

when I set this value to true, it gives the same error.

The Source Table is stored in Sequence Format and Destination Table is stored in RC Format. Can anyone explain what difference does this setting makes internally?

回答1:

Sometimes when we try to do an Insert Table with Dynamic Partitions set to True we get these error.

This happens because hive passes some internal columns to help the reducer phase which is not the part of the data when hive.optimize.sort.dynamic.partition is enabled. This setting is not a stable one.

That is why this setting is disabled by default in hive0.14.0 and later versions but by default enabled in hive0.13.0. Hope you get it....

回答2:

The Error occurs due to the RC file stripe buffers going OOM due to too many record writers open simultaneously.

Hive configuration Property:

hive.optimize.sort.dynamic.partition

When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.

Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
Added In: Hive 0.13.0 with HIVE-6455

Source - Hive Config properties

来源：https://stackoverflow.com/questions/33147764/hive-setting-hive-optimize-sort-dynamic-partition

标签

Hadoop

Hive

insert

hive-partitions