How to overwrite/reuse the existing output path for Hadoop jobs again and agian

后端未结

关注

 10  880

既然无缘 2021-02-12 10:29

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day\'s job run results

10条回答

旧巷少年郎 (楼主)

2021-02-12 11:19
I encountered this exact problem, it stems from the exception raised in checkOutputSpecs in the class FileOutputFormat. In my case, I wanted to have many jobs adding files to directories that already exist and I guaranteed that the files would have unique names.

I solved it by creating an output format class which overrides only the checkOutputSpecs method and suffocates (ignores) the FileAlreadyExistsException that's thrown where it checks if the directory already exists.
```
public class OverwriteTextOutputFormat extends TextOutputFormat {
    @Override
    public void checkOutputSpecs(JobContext job) throws IOException {
        try {
            super.checkOutputSpecs(job);
        }catch (FileAlreadyExistsException ignored){
            // Suffocate the exception
        }
    }
}
```
And the in the job configuration, I used LazyOutputFormat and also MultipleOutputs.
```
LazyOutputFormat.setOutputFormatClass(job, OverwriteTextOutputFormat.class);
```
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...