Specify minimum number of generated files from Hive insert
问题 I am using Hive on AWS EMR to insert the results of a query into a Hive table partitioned by date. Although the total output size each day is similar, the number of generated files varies, usually between 6 to 8, but some days it creates just a single big file. I reran the query a couple of times, just in case the number of files happens to be influenced by the availability of nodes in the cluster but it seems it's consistent. So my questions are (a) what determines how many files are