Apache hive MSCK REPAIR TABLE new partition not added

送分小仙女□ 提交于 2019-12-07 02:04:01

问题


I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried,

-- creating external table

hive> create external table factory(name string, empid int, age int) partitioned by(region string)  
    > row format delimited fields terminated by ','; 

--Detailed Table Information

Location:  hdfs://localhost.localdomain:8020/user/hive/warehouse/factory     
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    EXTERNAL                TRUE                
    transient_lastDdlTime   1438579844  

-- creating directory in HDFS to load data for table factory

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

-- Table data

cat factory1.txt
emp1,500,40
emp2,501,45
emp3,502,50

cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30

-- copying from local to HDFS

[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

-- Altering table to update in the metastore

hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';            
hive> select * from factory;                                                                      
OK
emp1    500 40  northregion
emp2    501 45  northregion
emp3    502 50  northregion
EMP10   200 25  southregion
EMP11   201 27  southregion
EMP12   202 30  southregion

Now I created new file factory3.txt to add as new partition for the table factory

cat factory3.txt
user1,100,25
user2,101,27
user3,102,30

-- creating the path and copying table data

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'

now I executed the below query to update the metastore for the new partition added

MSCK REPAIR TABLE factory;

Now the table is not giving the new partition content of factory3 file. Can I know where I am doing mistake while adding partition for table factory?

whereas, if I run the alter command then it is showing the new partition data.

hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';

Can I know why the MSCK REPAIR TABLE command is not working?


回答1:


For the MSCK to work, naming convention /partition_name=partition_value/ should be used.




回答2:


You have to put data in directory named 'region=eastregio' in table location directory:

$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'


来源:https://stackoverflow.com/questions/31781979/apache-hive-msck-repair-table-new-partition-not-added

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!