Drop Hive Table & msck repair fails with Table stored in google cloud bucket

你离开我真会死。 提交于 2020-08-10 03:38:00

问题


I am creating hive table in Google Cloud Bucket using below SQL statement.

CREATE TABLE schema_name.table_name (column1 decimal(10,0), column2 int, column3 date) 
   PARTITIONED BY(column7 date) STORED AS ORC
   LOCATION 'gs://crazybucketstring/' 
   TBLPROPERTIES('ORC.COMPRESS'='SNAPPY');

Then I loaded data into this table using distcp command, Now when I try to Drop table it fails with below error message, Even if I try to drop empty table it fails.

hive>>DROP TABLE schema_name.table_name; 

**Error:** Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask.MetaException
(message:java.lang.IllegalArgumentException: `hadoopPath must not be null`)
(state=08S01,code=1)

I also removed files from Google Cloud Storage bucket using gsutil rm -r gs:// command but still not able to delete table and giving same error

Also on running msck repair table it is giving following error.

FAILED: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) 

Any Idea what could be wrong?


回答1:


The problem is related to bucket location. I will try to explain it in step by step manner on how to recreate it and how to solve it. this same issue also result in unable to run msck repair command.

How to Recreate it:

  1. First I created a table (T1) with location pointing to the bucket given here:
    LOCATION 'gs://crazybucketstring/'

  2. Then I created another table (T2) in-side bucket in subfolder with location as given below
    LOCATION gs://crazybucketstring/schemname/tableaname/

  3. Now when I try to drop first table (T1) it throws error as entire bucket is behaving as table and it can't delete bucket, it can just delete files.

  4. When I try to drop table (T2) I am able to drop it and also files inside bucket subdirectory is deleted as it is managed table. Table T1 is still a headache.

In a desperate bid to delete Table T1, I emptied the bucket using gsutil rm -r command and tried msck repair table tablename and strangely msck repair command failed with below error message

>>  msck repair table tablename
Error: Error while processing statement: FAILED: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)

As usual DROP command was still not working.

Solution:

Eventually I got this Idea which worked.

  1. I Altered Table T1 and SET its location to subdirectory inside bucket instead of bare bucket.
    ALTER TABLE TABLENAME SET LOCATION gs://crazybucketstring/schemname/tableaname/
  2. Now I do 'msck repair' and it doesn't throw any error.
  3. I issued DROP Table command and it worked.

This issue is related to Table Location which we should deal with carefully while creating more than 1 Table in same bucket. Best practice is to use different subdirectories inside bucket to create different tables and avoid using just bucket path as table location specially if you have to create multiple tables in same bucket. Thank you and feel free to reach out to Me for Big Data issues.



来源:https://stackoverflow.com/questions/63146214/drop-hive-table-msck-repair-fails-with-table-stored-in-google-cloud-bucket

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!