How to overwrite/reuse the existing output path for Hadoop jobs again and agian

后端未结

关注

 10  881

既然无缘 2021-02-12 10:29

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day\'s job run results

10条回答

一整个雨季 (楼主)

2021-02-12 11:00

Jungblut's answer is your direct solution. Since I never trust automated processes to delete stuff (me personally), I'll suggest an alternative:

Instead of trying to overwrite, I suggest you make the output name of your job dynamic, including the time in which it ran.

Something like "/path/to/your/output-2011-10-09-23-04/". This way you can keep around your old job output in case you ever need to revisit in. In my system, which runs 10+ daily jobs, we structure the output to be: /output/job1/2011/10/09/job1out/part-r-xxxxx, /output/job1/2011/10/10/job1out/part-r-xxxxx, etc.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...