问题
Recently I came across this issue. I had a file at a Hadoop Distributed File System path and related hive table. The table had 30 partitions on both sides.
I deleted 5 partitions from HDFS and then executed "msck repair table <db.tablename>;"
on the hive table. It completed fine but outputted
"Partitions missing from filesystem:"
I tried running select count(*) <db.tablename>;
(on tez) it failed with the following error:
Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
But when I set hive.execution.engine as "mr"
and executed "select count(*) <db.tablename>;"
it worked fine without any issue.
I have two questions now :
How is this is possible?
How can I sync the hive metastore and an hdfs partition? For the above case .(My hive version is " Hive 1.2.1000.2.6.5.0-292 ".)
Thanks in advance for help.
回答1:
MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];
This will update metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. The default option for MSC command is ADD PARTITIONS. With this option, it will add any partitions that exist on HDFS but not in metastore to the metastore. The DROP PARTITIONS option will remove the partition information from metastore, that is already removed from HDFS. The SYNC PARTITIONS option is equivalent to calling both ADD and DROP PARTITIONS.
However, this is available only from Hive version 3.0.. See - HIVE-17824
In your case, the version is Hive 1.2, below are the steps to sync the HDFS Partitions and Table Partitions in Metastore.
- Drop the corresponding 5 partitions those have been removed by you from HDFS directly, using the below
ALTER
statement .
ALTER TABLE <db.table_name> DROP PARTITION (<partition_column=value>);
- Run
SHOW PARTITIONS <table_name>;
and see if the list of partitions are refreshed.
This should sync the partitions in HMS as in HDFS.
Alternatively, you can drop and recreate the table (IF it is an EXTERNAL table), perform MSCK REPAIR
on the newly created table. Because dropping an external table will not delete the underlying data.
Note: By default, MSCK REPAIR
will only add newly added partitions in HDFS to Hive Metastore and does not delete the Partitions from Hive Metastore those have been deleted in HDFS manually.
====
To avoid these steps in future, it is good to delete the partitions directly using ALTER TABLE <table_name> DROP PARTITION (<partition_column=value>)
from Hive.
来源:https://stackoverflow.com/questions/57679143/diffrence-in-behaviour-while-running-count-in-tez-and-map-reduce