How to overwrite/reuse the existing output path for Hadoop jobs again and agian

后端 未结 10 880
既然无缘
既然无缘 2021-02-12 10:29

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day\'s job run results

10条回答
  •  旧巷少年郎
    2021-02-12 11:19

    I encountered this exact problem, it stems from the exception raised in checkOutputSpecs in the class FileOutputFormat. In my case, I wanted to have many jobs adding files to directories that already exist and I guaranteed that the files would have unique names.

    I solved it by creating an output format class which overrides only the checkOutputSpecs method and suffocates (ignores) the FileAlreadyExistsException that's thrown where it checks if the directory already exists.

    public class OverwriteTextOutputFormat extends TextOutputFormat {
        @Override
        public void checkOutputSpecs(JobContext job) throws IOException {
            try {
                super.checkOutputSpecs(job);
            }catch (FileAlreadyExistsException ignored){
                // Suffocate the exception
            }
        }
    }
    

    And the in the job configuration, I used LazyOutputFormat and also MultipleOutputs.

    LazyOutputFormat.setOutputFormatClass(job, OverwriteTextOutputFormat.class);
    

提交回复
热议问题