Getting Filename/FileData as key/value input for Map when running a Hadoop MapReduce Job

后端 未结 1 2000
眼角桃花
眼角桃花 2020-12-20 01:14

I went through the question How to get Filename/File Contents as key/value input for MAP when running a Hadoop MapReduce Job? here. Though it explains the concept, I am unab

相关标签:
1条回答
  • 2020-12-20 02:08

    Have this code in your CustomRecordReader class.

    private LineRecordReader lineReader;
    
    private String fileName;
    
    public CustomRecordReader(JobConf job, FileSplit split) throws IOException {
        lineReader = new LineRecordReader(job, split);
        fileName = split.getPath().getName();
    }
    
    public boolean next(Text key, Text value) throws IOException {
        // get the next line
        if (!lineReader.next(key, value)) {
            return false;
        }    
    
        key.set(fileName);
        value.set(value);
    
        return true;
    }
    
    public Text createKey() {
        return new Text("");
    }
    
    public Text createValue() {
        return new Text("");
    }
    

    Remove SPDRecordReader constructor (It is an error).

    And have this code in your CustomFileInputFormat class

    public RecordReader<Text, Text> getRecordReader(
      InputSplit input, JobConf job, Reporter reporter)
      throws IOException {
    
        reporter.setStatus(input.toString());
        return new CustomRecordReader(job, (FileSplit)input);
    }
    
    0 讨论(0)
提交回复
热议问题