问题
I have to create an automated process to list all external tables in Hive and do a record count on those tables.
I should do this as a daily job. I tried this by hard coding all the external table names, but this is not accepted as the tables keep on changing once in a month.
I have gone through different approaches like [show tables]
and executing query in metastore DB. But these will not help me in automating the process.
Is there a better approach to implement this in Hive.
回答1:
Something like this, using shell.
#Create external table list for a schema
SCHEMA=your_schema_name
#define filenames
alltableslist=tables_$SCHEMA
exttablelist=ext_tables_$SCHEMA
#Get all tables
hive -S -e " set hive.cli.print.header=false; use $SCHEMA; show tables;" 1> $alltableslist
#For each table check its type:
for table in $(cat $alltableslist)
do
echo Processing table $table ...
#Describe table
describe=$(hive client -S -e "use $SCHEMA; DESCRIBE FORMATTED $table")
#Get type
table_type=$(echo "${describe}" | egrep -o 'Table Type:[^,]+' | cut -f2)
#Check table type, get count and write table name with count
if [ $table_type == EXTERNAL_TABLE ]; then
#get count
cnt=$(hive client -S -e "select count(*) from $SCHEMA.table ")
#save result
echo "$table $cnt" > $exttablelist
fi
done; #tables loop
Just replace your_schema_name
at the beginning with your schema name. External tables with counts in this example will be saved in the file ext_tables_[your_schema_name]
Counts can be processed in parallel and even in single SQL statement and many other things can be improved, but hope you have caught the idea.
回答2:
As per my understanding first your requirement is to find out list of Hive external tables in all hive databases .
Database name,Table name,Table Type(External) and HDFS Location of hive External tables.
Login to hive metastore and use Hive meta database.
Use 3 tables TBLS,DBS and SDS tables , on top of this 3 tables we can apply joins on DB_ID and SD_ID
By using above format you can get databases names along with respected hive external tables list and HDFS path location also .
For Querey and output information please read this link
https://askdoubts.com/question/how-to-find-out-list-of-all-hive-external-tables-and-hdfs-paths-from-hive-metastore/#comment-19
来源:https://stackoverflow.com/questions/50341323/automated-list-of-hive-external-tables