The way to check a HDFS directory's size?

前端 未结 10 623
轻奢々
轻奢々 2021-01-30 12:14

I know du -sh in common Linux filesystems. But how to do that with HDFS?

10条回答
  •  终归单人心
    2021-01-30 12:55

    When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

    Directory structure:

    some_dir
    ├abc.txt    
    ├count1.txt 
    ├count2.txt 
    └def.txt    
    

    Assume each file is 1 KB in size. You can summarize the entire directory with:

    hdfs dfs -du -s some_dir
    4096 some_dir
    

    However, if I want the sum of all files containing "count" the command falls short.

    hdfs dfs -du -s some_dir/count*
    1024 some_dir/count1.txt
    1024 some_dir/count2.txt
    

    To get around this I usually pass the output through awk.

    hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
    2048 
    

提交回复
热议问题