Programmatically reading the output of Hadoop Mapreduce Program

前端 未结 3 587
忘掉有多难
忘掉有多难 2021-02-09 02:26

This may be a basic question, but I could not find an answer for it on Google.
I have a map-reduce job that creates multiple output files in its output directory. My Java a

3条回答
  •  盖世英雄少女心
    2021-02-09 03:05

    The method you are looking for is called listStatus(Path). It simply returns all files inside of a Path as a FileStatus array. Then you can simply loop over them create a path object and read it.

        FileStatus[] fss = fs.listStatus(new Path("/"));
        for (FileStatus status : fss) {
            Path path = status.getPath();
            SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);
            IntWritable key = new IntWritable();
            IntWritable value = new IntWritable();
            while (reader.next(key, value)) {
                System.out.println(key.get() + " | " + value.get());
            }
            reader.close();
        }
    

    For Hadoop 2.x you can setup the reader like this:

     SequenceFile.Reader reader = 
               new SequenceFile.Reader(conf, SequenceFile.Reader.file(path))
    

提交回复
热议问题