How to prevent committing of an empty Avro file into HDFS?

风流意气都作罢 提交于 2020-01-06 09:02:59

问题


I have a job that create a Avro file into HDFS and append the file with data. However, occasionally there wont be any data for appending, in that case I don't want the application to flush and close the file, instead it should check whether the file is empty or not (but I assume thatthe Avro schema will be written into the header so technically not an empty file) and delete the file if it is empty.

Is this feasible with Avro+HDFS lib?


回答1:


Try using LazyOutputFormat when specifying the output format for your job. It creates output lazily, meaning that an output file will only be created if output exists.

So instead of writing something like: job.setOutputFormatClass(TextOutputFormat.class);

You can use LazyOutputFormat like this instead: LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);



来源:https://stackoverflow.com/questions/26408517/how-to-prevent-committing-of-an-empty-avro-file-into-hdfs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!