问题
I am using Hive v1.2.1 with Tez. I have an external partitioned table. The partitions are hourly and of the form p=yyyy_mm_dd_hh. The situation is that these partition directories in hdfs are likely to be deleted sometime. After they are deleted, hive still contains the metadata for that partition, and a command 'show partitions ' would still list the partition whose directory was deleted from hdfs. Normally, this is not likely to cause any problem, and a select query for the partition(whose directory was deleted) would simply result an empty resultset:
hive> select * from test_tab where p='2015_01_01_01';
OK
Time taken: 2.168 seconds
However, on running any aggregate query against the same partition, I get an error:
hive> select count(*) from test_tab where p='2015_01_01_01';
FAILED: SemanticException java.io.FileNotFoundException: File hdfs://localhost:8020/user/root/data/test_db/test_tab/p=2015_01_01_01 does not exist.
I need to have the same behavior in aggregate queries as that in other select queries. This is probably a bug in hive. Any workaround-hints for this issue would be appreciated. Best Regards.
回答1:
run below command
msck repair table test_tab;
and then run your query
来源:https://stackoverflow.com/questions/37456932/aggregate-queries-fail-in-hive-if-partition-directory-doesnt-exist