I am using Hadoop version 2.7 and its FileSystem API. The question is about \"how to count partitions with the API?\" but, to put it into a software problem, I am copin
Hive structures its metadata as database > tables > partitions > files. This typically translates into filesystem directory structure
. Where /partition/.../
signifies that tables can be partitioned by multiple columns thus creating a nested levels of subdirectories. (A partition is a directory with the name .../partition_column=value
by convention).
So seems like your script will be printing the number of files (parts
) and their total length (size
) for each single-column partitioned table in each of your databases, if I'm not mistaken.
As alternative, I'd suggest you look at hdfs dfs -count command to see if it suits your needs, and maybe wrap it in a simple shell script to loop through the databases and tables.