Automated List of Hive External tables

走远了吗. 提交于 2019-12-11 08:48:58

问题


I have to create an automated process to list all external tables in Hive and do a record count on those tables.

I should do this as a daily job. I tried this by hard coding all the external table names, but this is not accepted as the tables keep on changing once in a month.

I have gone through different approaches like [show tables] and executing query in metastore DB. But these will not help me in automating the process.

Is there a better approach to implement this in Hive.


回答1:


Something like this, using shell.

#Create external table list for a schema
SCHEMA=your_schema_name 

#define filenames   
alltableslist=tables_$SCHEMA
exttablelist=ext_tables_$SCHEMA

#Get all tables
 hive -S -e " set hive.cli.print.header=false; use $SCHEMA; show tables;" 1> $alltableslist


#For each table check its type:
for table in $(cat $alltableslist)
 do 

 echo Processing table $table ...

     #Describe table
     describe=$(hive client -S -e "use $SCHEMA; DESCRIBE FORMATTED $table")

     #Get type
     table_type=$(echo "${describe}" | egrep -o 'Table Type:[^,]+' | cut -f2)

     #Check table type, get count and write table name with count
      if [ $table_type == EXTERNAL_TABLE ]; then 
         #get count
          cnt=$(hive client -S -e "select count(*) from $SCHEMA.table ")
         #save result
          echo "$table $cnt" > $exttablelist 
      fi

done; #tables loop

Just replace your_schema_name at the beginning with your schema name. External tables with counts in this example will be saved in the file ext_tables_[your_schema_name]

Counts can be processed in parallel and even in single SQL statement and many other things can be improved, but hope you have caught the idea.




回答2:


As per my understanding first your requirement is to find out list of Hive external tables in all hive databases .

Database name,Table name,Table Type(External) and HDFS Location of hive External tables.

Login to hive metastore and use Hive meta database.

Use 3 tables TBLS,DBS and SDS tables , on top of this 3 tables we can apply joins on DB_ID and SD_ID

By using above format you can get databases names along with respected hive external tables list and HDFS path location also .

For Querey and output information please read this link

https://askdoubts.com/question/how-to-find-out-list-of-all-hive-external-tables-and-hdfs-paths-from-hive-metastore/#comment-19



来源:https://stackoverflow.com/questions/50341323/automated-list-of-hive-external-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!