How to count partitions with FileSystem API?

前端未结

关注

 1  1228

情话喂你 2021-01-24 11:44

I am using Hadoop version 2.7 and its FileSystem API. The question is about \"how to count partitions with the API?\" but, to put it into a software problem, I am copin

1条回答

粉色の甜心 (楼主)

2021-01-24 12:17

Hive structures its metadata as database > tables > partitions > files. This typically translates into filesystem directory structure /database.db/table/partition/.../files. Where /partition/.../ signifies that tables can be partitioned by multiple columns thus creating a nested levels of subdirectories. (A partition is a directory with the name .../partition_column=value by convention).

So seems like your script will be printing the number of files (parts) and their total length (size) for each single-column partitioned table in each of your databases, if I'm not mistaken.

As alternative, I'd suggest you look at hdfs dfs -count command to see if it suits your needs, and maybe wrap it in a simple shell script to loop through the databases and tables.

0 讨论(0)
发布评论:

提交评论
- 加载中...