How to overwrite/reuse the existing output path for Hadoop jobs again and agian

后端 未结 10 860
既然无缘
既然无缘 2021-02-12 10:29

I want to overwrite/reuse the existing output directory when I run my Hadoop job daily. Actually the output directory will store summarized output of each day\'s job run results

10条回答
  •  一整个雨季
    2021-02-12 11:00

    Jungblut's answer is your direct solution. Since I never trust automated processes to delete stuff (me personally), I'll suggest an alternative:

    Instead of trying to overwrite, I suggest you make the output name of your job dynamic, including the time in which it ran.

    Something like "/path/to/your/output-2011-10-09-23-04/". This way you can keep around your old job output in case you ever need to revisit in. In my system, which runs 10+ daily jobs, we structure the output to be: /output/job1/2011/10/09/job1out/part-r-xxxxx, /output/job1/2011/10/10/job1out/part-r-xxxxx, etc.

提交回复
热议问题