问题
Currently I am able to implement the name change from part-00000
to a custom fileName in mapper. I am doing this by taking the inputSplit
. I tried the same in reducer to rename the file but, fileSplit method is not available for reducer. So, is there a best way to rename the output of a reducer to with inputfile name. Below is how I acheived it in mapper.
@Override
public void setup(Context con) throws IOException, InterruptedException {
fileName = ((FileSplit) con.getInputSplit()).getPath().getName();
fileName = fileName.substring(0,36);
outputName = new Text(fileName);
final Path baseOutputPath = FileOutputFormat.getOutputPath(con);
final Path outputFilePath = new Path(baseOutputPath, fileName);
TextOutputFormat<IntWritable, Text> write = new TextOutputFormat<IntWritable, Text>() {
@Override
public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
return outputFilePath;
回答1:
This is what hadoop wiki says:
You can subclass the OutputFormat.java class and write your own. You can locate and browse the code of TextOutputFormat, MultipleOutputFormat.java, etc. for reference. It might be the case that you only need to do minor changes to any of the existing Output Format classes. To do that you can just subclass that class and override the methods you need to change.
If you need to be on key and input file format, then you could create subclass of MultipleOutputFormat to control output file name.
来源:https://stackoverflow.com/questions/27488624/how-to-change-the-output-file-name-from-part-00000-in-reducer-to-inputfile-name