The way to check a HDFS directory's size?

前端 未结 10 609
轻奢々
轻奢々 2021-01-30 12:14

I know du -sh in common Linux filesystems. But how to do that with HDFS?

相关标签:
10条回答
  • 2021-01-30 12:39

    hadoop fs -du -s -h /path/to/dir displays a directory's size in readable form.

    0 讨论(0)
  • 2021-01-30 12:45

    Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

    hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

    It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.

    Options:

    • The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.
    • The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)
    • The -v option will display the names of columns as a header line.
    • The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

    du returns three columns with the following format:

     +-------------------------------------------------------------------+ 
     | size  |  disk_space_consumed_with_all_replicas  |  full_path_name | 
     +-------------------------------------------------------------------+ 
    

    ##Example command:

    hadoop fs -du /user/hadoop/dir1 \
        /user/hadoop/file1 \
        hdfs://nn.example.com/user/hadoop/dir1 
    

    Exit Code: Returns 0 on success and -1 on error.

    source: Apache doc

    0 讨论(0)
  • 2021-01-30 12:46

    Prior to 0.20.203, and officially deprecated in 2.6.0:

    hadoop fs -dus [directory]
    

    Since 0.20.203 (dead link) 1.0.4 and still compatible through 2.6.0:

    hdfs dfs -du [-s] [-h] URI [URI …]
    

    You can also run hadoop fs -help for more info and specifics.

    0 讨论(0)
  • 2021-01-30 12:48

    % of used space on Hadoop cluster
    sudo -u hdfs hadoop fs –df

    Capacity under specific folder:
    sudo -u hdfs hadoop fs -du -h /user

    0 讨论(0)
  • 2021-01-30 12:55

    When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

    Directory structure:

    some_dir
    ├abc.txt    
    ├count1.txt 
    ├count2.txt 
    └def.txt    
    

    Assume each file is 1 KB in size. You can summarize the entire directory with:

    hdfs dfs -du -s some_dir
    4096 some_dir
    

    However, if I want the sum of all files containing "count" the command falls short.

    hdfs dfs -du -s some_dir/count*
    1024 some_dir/count1.txt
    1024 some_dir/count2.txt
    

    To get around this I usually pass the output through awk.

    hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
    2048 
    
    0 讨论(0)
  • 2021-01-30 12:55

    hdfs dfs -count <dir>

    info from man page:

    -count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
      Count the number of directories, files and bytes under the paths
      that match the specified file pattern.  The output columns are:
      DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
      or, with the -q option:
      QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA
            DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
    
    0 讨论(0)
提交回复
热议问题