Programmatically reading the output of Hadoop Mapreduce Program

前端未结

关注

 3  600

忘掉有多难 2021-02-09 02:26

This may be a basic question, but I could not find an answer for it on Google.
I have a map-reduce job that creates multiple output files in its output directory. My Java a

3条回答

盖世英雄少女心 (楼主)

2021-02-09 03:05

The method you are looking for is called listStatus(Path). It simply returns all files inside of a Path as a FileStatus array. Then you can simply loop over them create a path object and read it.

    FileStatus[] fss = fs.listStatus(new Path("/"));
    for (FileStatus status : fss) {
        Path path = status.getPath();
        SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);
        IntWritable key = new IntWritable();
        IntWritable value = new IntWritable();
        while (reader.next(key, value)) {
            System.out.println(key.get() + " | " + value.get());
        }
        reader.close();
    }

For Hadoop 2.x you can setup the reader like this:

 SequenceFile.Reader reader = 
           new SequenceFile.Reader(conf, SequenceFile.Reader.file(path))

0 讨论(0)

查看其它3个回答