问题
We are storing string fields which varies in length from small(few kB) to very long(<400MB) in HIVE table. Now we are facing the issue of OOM when copying data from one table to another(without any conditions or joins), which is not exactly what we are running in production, but it is the most simple use case where this problem occurs. So the HQL is basically just:
INSERT INTO new_table
SELECT * FROM old_table;
Container and Java Heap was set to 16GB, we had tried different file formats (RCFile, ORC), with and without compression, different engines(MR, TEZ) etc., but nothing helps and we always run into OOM.
We are not sure what is exactly happening there. We were expecting that Java process will take just few times memory of max length of single record, which is ~400M, but not whole 16GB heap.
Can you give us something we should try or focus on ?
Version used: HDP 2.4.2
Sample log when using TEZ+ORC+8G of RAM: https://pastebin.com/uza84t6F
回答1:
Try to use TEXTFILE instead of ORC. Writing an ORC file requires much more memory.
Try to increase parallelism, add more mappers. Play with these parameters for Tez and try to increase the number of mappers:
--min and max split size:
set tez.grouping.min-size=16777216;
set tez.grouping.max-size=1073741824;
See here: https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html
来源:https://stackoverflow.com/questions/44881129/hive-very-long-field-gives-oom-heap