How to find the size of a HDFS file

后端 未结 6 1213
抹茶落季
抹茶落季 2020-12-23 21:10

How to find the size of a HDFS file? What command should be used to find the size of any file in HDFS.

相关标签:
6条回答
  • 2020-12-23 21:28

    If you want to do it through the API, you can use 'getFileStatus()' method.

    0 讨论(0)
  • 2020-12-23 21:29

    See the command below with awk script to see the size (in GB) of filtered output in HDFS:

    hadoop fs -du -s /data/ClientDataNew/**A***  | awk '{s+=$1} END {printf "%.3fGB\n", s/1000000000}'
    

    output ---> 2.089GB

    hadoop fs -du -s /data/ClientDataNew/**B***  | awk '{s+=$1} END {printf "%.3fG\n", s/1000000000}'
    

    output ---> 1.724GB

    hadoop fs -du -s /data/ClientDataNew/**C***  | awk '{s+=$1} END {printf  "%.3fG\n", s/1000000000}'
    

    output ---> 0.986GB

    0 讨论(0)
  • 2020-12-23 21:36

    You can use hadoop fs -ls command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.

    For e.g. command hadoop fs -ls input gives following output:

    Found 1 items
    -rw-r--r--   1 hduser supergroup      45956 2012-07-19 20:57 /user/hduser/input/sou
    

    The size of file sou is 45956 bytes.

    0 讨论(0)
  • 2020-12-23 21:36
    hdfs dfs -du -s -h /directory
    

    This is the human readable version, otherwise it will give in bad units (slight bigger)

    0 讨论(0)
  • 2020-12-23 21:42

    I used the below function which helped me to get the file size.

    public class GetflStatus
    {
        public long getflSize(String args) throws IOException, FileNotFoundException
        {
            Configuration config = new Configuration();
            Path path = new Path(args);
            FileSystem hdfs = path.getFileSystem(config);
            ContentSummary cSummary = hdfs.getContentSummary(path);
            long length = cSummary.getLength();
            return length;
        }
    }
    
    0 讨论(0)
  • 2020-12-23 21:48

    I also find myself using hadoop fs -dus <path> a great deal. For example, if a directory on HDFS named "/user/frylock/input" contains 100 files and you need the total size for all of those files you could run:

    hadoop fs -dus /user/frylock/input
    

    and you would get back the total size (in bytes) of all of the files in the "/user/frylock/input" directory.

    Also, keep in mind that HDFS stores data redundantly so the actual physical storage used up by a file might be 3x or more than what is reported by hadoop fs -ls and hadoop fs -dus.

    0 讨论(0)
提交回复
热议问题