How to remove r-00000 extention from reducer output in mapreduce

谁说我不能喝 提交于 2019-12-06 08:15:17

I was able to do it explicitly after my job finishes and thats ok for me.No delay in the job

if (b){
            DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-HHmm");
            Calendar cal = Calendar.getInstance();
            String strDate=dateFormat.format(cal.getTime());
            FileSystem hdfs = FileSystem.get(getConf());
            FileStatus fs[] = hdfs.listStatus(new Path(args[1]));
            if (fs != null){ 
                for (FileStatus aFile : fs) {
                    if (!aFile.isDir()) {
                        hdfs.rename(aFile.getPath(), new Path(aFile.getPath().toString()+".txt"));
                    }
                }
            }
        }

A more suitable approach to the problem would be changing the OutputFormat.

For eg :- If you are using TextOutputFormatClass, just get the source code of the TextOutputFormat class and modify the below method to get the proper filename (without r-00000). We need to then set the modified output format in the driver.

public synchronized static String getUniqueFile(TaskAttemptContext context, String name, String extension) {
    /*TaskID taskId = context.getTaskAttemptID().getTaskID();
    int partition = taskId.getId();*/
    StringBuilder result = new StringBuilder();
    result.append(name);        
    /*
     * result.append('-');
     * result.append(TaskID.getRepresentingCharacter(taskId.getTaskType()));
     * result.append('-'); result.append(NUMBER_FORMAT.format(partition));
     * result.append(extension);
     */
    return result.toString();
}

So whatever name is passed through the multiple outputs, filename will be created according to it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!