Hive version: 1.2.1
Configuration:
set hive.execution.engine=tez;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=256000000;
set hive.merge.tezfiles=true;
HQL:
ALTER TABLE `table_name` PARTITION (partion_name1 = 'val1', partion_name2='val2', partion_name3='val3', partion_name4='val4') CONCATENATE;
I use the HQL to merge files of specific table / partition. However, after execution there are still many files in output directory; and their size are far less than 256000000. So how to decrease the number of output files.
BTW, use MapReduce instead of Tez also didn't work.
You may set your reducer number to 1 then, it would only create one output file.
You may do it with the following;
set mapred.reduce.tasks=1
heyhey
Maybe u can try insert overwrite table ... partition ( ... ) select * from ...
This one can use the merge setting for tezfiles.
来源:https://stackoverflow.com/questions/33166387/how-to-reduce-generating-files-of-sql-alter-table-partition-concatenate-in-hiv