问题
I am able to rename my reducer output file correctly but r-00000 is still persisting . I have used MultipleOutputs in my reducer class . Here is details of the that .Not sure what am i missing or what extra i have to do?
public class MyReducer extends Reducer<NullWritable, Text, NullWritable, Text> {
private Logger logger = Logger.getLogger(MyReducer.class);
private MultipleOutputs<NullWritable, Text> multipleOutputs;
String strName = "";
public void setup(Context context) {
logger.info("Inside Reducer.");
multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
}
@Override
public void reduce(NullWritable Key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text value : values) {
final String valueStr = value.toString();
StringBuilder sb = new StringBuilder();
sb.append(strArrvalueStr[0] + "|!|");
multipleOutputs.write(NullWritable.get(), new Text(sb.toString()),strName);
}
}
public void cleanup(Context context) throws IOException,
InterruptedException {
multipleOutputs.close();
}
}
回答1:
I was able to do it explicitly after my job finishes and thats ok for me.No delay in the job
if (b){
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-HHmm");
Calendar cal = Calendar.getInstance();
String strDate=dateFormat.format(cal.getTime());
FileSystem hdfs = FileSystem.get(getConf());
FileStatus fs[] = hdfs.listStatus(new Path(args[1]));
if (fs != null){
for (FileStatus aFile : fs) {
if (!aFile.isDir()) {
hdfs.rename(aFile.getPath(), new Path(aFile.getPath().toString()+".txt"));
}
}
}
}
回答2:
A more suitable approach to the problem would be changing the OutputFormat.
For eg :- If you are using TextOutputFormatClass, just get the source code of the TextOutputFormat class and modify the below method to get the proper filename (without r-00000). We need to then set the modified output format in the driver.
public synchronized static String getUniqueFile(TaskAttemptContext context, String name, String extension) {
/*TaskID taskId = context.getTaskAttemptID().getTaskID();
int partition = taskId.getId();*/
StringBuilder result = new StringBuilder();
result.append(name);
/*
* result.append('-');
* result.append(TaskID.getRepresentingCharacter(taskId.getTaskType()));
* result.append('-'); result.append(NUMBER_FORMAT.format(partition));
* result.append(extension);
*/
return result.toString();
}
So whatever name is passed through the multiple outputs, filename will be created according to it.
来源:https://stackoverflow.com/questions/42547065/how-to-remove-r-00000-extention-from-reducer-output-in-mapreduce