HIVE very long field gives OOM Heap

冷暖自知 提交于 2019-12-11 15:27:12

问题


We are storing string fields which varies in length from small(few kB) to very long(<400MB) in HIVE table. Now we are facing the issue of OOM when copying data from one table to another(without any conditions or joins), which is not exactly what we are running in production, but it is the most simple use case where this problem occurs. So the HQL is basically just:

INSERT INTO new_table
SELECT * FROM old_table;

Container and Java Heap was set to 16GB, we had tried different file formats (RCFile, ORC), with and without compression, different engines(MR, TEZ) etc., but nothing helps and we always run into OOM.

We are not sure what is exactly happening there. We were expecting that Java process will take just few times memory of max length of single record, which is ~400M, but not whole 16GB heap.

Can you give us something we should try or focus on ?

Version used: HDP 2.4.2

Sample log when using TEZ+ORC+8G of RAM: https://pastebin.com/uza84t6F


回答1:


  1. Try to use TEXTFILE instead of ORC. Writing an ORC file requires much more memory.

  2. Try to increase parallelism, add more mappers. Play with these parameters for Tez and try to increase the number of mappers:

--min and max split size:

set tez.grouping.min-size=16777216;
set tez.grouping.max-size=1073741824;

See here: https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html



来源:https://stackoverflow.com/questions/44881129/hive-very-long-field-gives-oom-heap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!