How to count partitions with FileSystem API?

前端 未结 1 1228
情话喂你
情话喂你 2021-01-24 11:44

I am using Hadoop version 2.7 and its FileSystem API. The question is about \"how to count partitions with the API?\" but, to put it into a software problem, I am copin

1条回答
  •  粉色の甜心
    2021-01-24 12:17

    Hive structures its metadata as database > tables > partitions > files. This typically translates into filesystem directory structure /database.db/table/partition/.../files. Where /partition/.../ signifies that tables can be partitioned by multiple columns thus creating a nested levels of subdirectories. (A partition is a directory with the name .../partition_column=value by convention).

    So seems like your script will be printing the number of files (parts) and their total length (size) for each single-column partitioned table in each of your databases, if I'm not mistaken.

    As alternative, I'd suggest you look at hdfs dfs -count command to see if it suits your needs, and maybe wrap it in a simple shell script to loop through the databases and tables.

    0 讨论(0)
提交回复
热议问题