File count in an HDFS directory

前端 未结 6 1989
攒了一身酷
攒了一身酷 2021-01-31 10:36

In Java code, I want to connect to a directory in HDFS, learn the number of files in that directory, get their names and want to read them. I can already read the files but I co

相关标签:
6条回答
  • 2021-01-31 10:48

    hadoop fs -du [-s] [-h] [-x] URI [URI ...]

    Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

    Options:

    The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, calculation is done by going 1-level deep from the given path.
    The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)
    The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.
    
    0 讨论(0)
  • 2021-01-31 10:57

    You can use the following to check file count in that particular directory

    hadoop fs -count /directoryPath/* | print $2 | wc -l

    count : counts the number of files, directories, and bytes under the path

    print $2 : To print second column from the output

    wc -l : To check the line count

    0 讨论(0)
  • 2021-01-31 10:58

    On command line, you can do it as below.

     hdfs dfs -ls $parentdirectory | awk '{system("hdfs dfs -count " $6) }'
    
    0 讨论(0)
  • 2021-01-31 11:03
    FileSystem fs = FileSystem.get(conf);
    Path pt = new Path("/path");
    ContentSummary cs = fs.getContentSummary(pt);
    long fileCount = cs.getFileCount();
    
    0 讨论(0)
  • 2021-01-31 11:04

    count

    Usage: hadoop fs -count [-q] <paths>
    

    Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME.

    The output columns with -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME.

    Example:

    hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
    hadoop fs -count -q hdfs://nn1.example.com/file1
    

    Exit Code:

    Returns 0 on success and -1 on error.

    You can just use the FileSystem and iterate over the files inside the path. Here is some example code

    int count = 0;
    FileSystem fs = FileSystem.get(getConf());
    boolean recursive = false;
    RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
    while (ri.hasNext()){
        count++;
        ri.next();
    }
    
    0 讨论(0)
  • 2021-01-31 11:10

    To do a quick and simple count, you can also try the following one-liner:

    hdfs dfs -ls -R /path/to/your/directory/ | grep -E '^-' | wc -l
    

    Quick explanation:

    grep -E '^-' or egrep '^-': Grep all files: Files start with '-' whereas folders start with 'd';

    wc -l: line count.

    0 讨论(0)
提交回复
热议问题