The way to check a HDFS directory's size?

前端未结

关注

 10  621

轻奢々

I know du -sh in common Linux filesystems. But how to do that with HDFS?

相关标签:

10条回答

野的像风

2021-01-30 12:39

hadoop fs -du -s -h /path/to/dir displays a directory's size in readable form.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情深已故

2021-01-30 12:45
Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.

Options:
- The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.
- The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)
- The -v option will display the names of columns as a header line.
- The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.
du returns three columns with the following format:
```
 +-------------------------------------------------------------------+ 
 | size  |  disk_space_consumed_with_all_replicas  |  full_path_name | 
 +-------------------------------------------------------------------+ 
```
##Example command:
```
hadoop fs -du /user/hadoop/dir1 \
    /user/hadoop/file1 \
    hdfs://nn.example.com/user/hadoop/dir1 
```
Exit Code: Returns 0 on success and -1 on error.

source: Apache doc
0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-01-30 12:46
Prior to 0.20.203, and officially deprecated in 2.6.0:
```
hadoop fs -dus [directory]
```
Since ~~0.20.203~~ (dead link) 1.0.4 and still compatible through 2.6.0:
```
hdfs dfs -du [-s] [-h] URI [URI …]
```
You can also run hadoop fs -help for more info and specifics.
0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2021-01-30 12:48

% of used space on Hadoop cluster
sudo -u hdfs hadoop fs –df

Capacity under specific folder:
sudo -u hdfs hadoop fs -du -h /user

0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2021-01-30 12:55
When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

Directory structure:
```
some_dir
├abc.txt    
├count1.txt 
├count2.txt 
└def.txt    
```
Assume each file is 1 KB in size. You can summarize the entire directory with:
```
hdfs dfs -du -s some_dir
4096 some_dir
```
However, if I want the sum of all files containing "count" the command falls short.
```
hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt
```
To get around this I usually pass the output through awk.
```
hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

感动是毒

2021-01-30 12:55

hdfs dfs -count <dir>

info from man page:

-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
  Count the number of directories, files and bytes under the paths
  that match the specified file pattern.  The output columns are:
  DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
  or, with the -q option:
  QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA
        DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

0 讨论(0)

1 2 下一页

The way to check a HDFS directory's size?

`hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]`

Options:

`du` returns three columns with the following format:

The way to check a HDFS directory's size?

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

Options:

du returns three columns with the following format:

`hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]`

`du` returns three columns with the following format: