I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day\'s job run results
I encountered this exact problem, it stems from the exception raised in checkOutputSpecs
in the class FileOutputFormat
. In my case, I wanted to have many jobs adding files to directories that already exist and I guaranteed that the files would have unique names.
I solved it by creating an output format class which overrides only the checkOutputSpecs
method and suffocates (ignores) the FileAlreadyExistsException
that's thrown where it checks if the directory already exists.
public class OverwriteTextOutputFormat extends TextOutputFormat {
@Override
public void checkOutputSpecs(JobContext job) throws IOException {
try {
super.checkOutputSpecs(job);
}catch (FileAlreadyExistsException ignored){
// Suffocate the exception
}
}
}
And the in the job configuration, I used LazyOutputFormat
and also MultipleOutputs
.
LazyOutputFormat.setOutputFormatClass(job, OverwriteTextOutputFormat.class);