问题
I have a job that create a Avro file into HDFS and append the file with data. However, occasionally there wont be any data for appending, in that case I don't want the application to flush and close the file, instead it should check whether the file is empty or not (but I assume thatthe Avro schema will be written into the header so technically not an empty file) and delete the file if it is empty.
Is this feasible with Avro+HDFS lib?
回答1:
Try using LazyOutputFormat when specifying the output format for your job. It creates output lazily, meaning that an output file will only be created if output exists.
So instead of writing something like: job.setOutputFormatClass(TextOutputFormat.class);
You can use LazyOutputFormat like this instead: LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
来源:https://stackoverflow.com/questions/26408517/how-to-prevent-committing-of-an-empty-avro-file-into-hdfs