Diffrence in behaviour while running “count(*) ” in Tez and Map reduce

﹥>﹥吖頭↗ 提交于 2019-12-11 08:04:07

问题


Recently I came across this issue. I had a file at a Hadoop Distributed File System path and related hive table. The table had 30 partitions on both sides.

I deleted 5 partitions from HDFS and then executed "msck repair table <db.tablename>;" on the hive table. It completed fine but outputted

"Partitions missing from filesystem:"

I tried running select count(*) <db.tablename>; (on tez) it failed with the following error:

Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException:

But when I set hive.execution.engine as "mr" and executed "select count(*) <db.tablename>;" it worked fine without any issue.

I have two questions now :

  1. How is this is possible?

  2. How can I sync the hive metastore and an hdfs partition? For the above case .(My hive version is " Hive 1.2.1000.2.6.5.0-292 ".)

Thanks in advance for help.


回答1:


MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];

This will update metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. The default option for MSC command is ADD PARTITIONS. With this option, it will add any partitions that exist on HDFS but not in metastore to the metastore. The DROP PARTITIONS option will remove the partition information from metastore, that is already removed from HDFS. The SYNC PARTITIONS option is equivalent to calling both ADD and DROP PARTITIONS.

However, this is available only from Hive version 3.0.. See - HIVE-17824

In your case, the version is Hive 1.2, below are the steps to sync the HDFS Partitions and Table Partitions in Metastore.

  1. Drop the corresponding 5 partitions those have been removed by you from HDFS directly, using the below ALTER statement .

ALTER TABLE <db.table_name> DROP PARTITION (<partition_column=value>);

  1. Run SHOW PARTITIONS <table_name>; and see if the list of partitions are refreshed.

This should sync the partitions in HMS as in HDFS.

Alternatively, you can drop and recreate the table (IF it is an EXTERNAL table), perform MSCK REPAIR on the newly created table. Because dropping an external table will not delete the underlying data.

Note: By default, MSCK REPAIR will only add newly added partitions in HDFS to Hive Metastore and does not delete the Partitions from Hive Metastore those have been deleted in HDFS manually.

====

To avoid these steps in future, it is good to delete the partitions directly using ALTER TABLE <table_name> DROP PARTITION (<partition_column=value>) from Hive.



来源:https://stackoverflow.com/questions/57679143/diffrence-in-behaviour-while-running-count-in-tez-and-map-reduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!