How to tell Hadoop to not delete temporary directory from HDFS when task is killed?

落花浮王杯 提交于 2019-12-22 01:32:08

问题


By default, hadoop map tasks write processed records to files in temporary directory at ${mapred.output.dir}/_temporary/_${taskid} . These files sit here until FileCommiter moves them to ${mapred.output.dir} (after task successfully finishes). I have case where in setup() of map task I need to create files under above provided temporary directory, where I write some process related data used later somewhere else. However, when hadoop tasks are killed, temporary directory is removed from HDFS.

Anyone knows if it is possible to tell Hadoop to not delete this directory after task is killed, and how to achieve that? I guess some property should be provided that I can configure.

Regards


回答1:


It's not a good practice to depend on temporary files, whose location and format can change anytime between releases.

Anyway, setting mapreduce.task.files.preserve.failedtasks to true will keep the temporary files for all the failed tasks and setting mapreduce.task.files.preserve.filepattern to regex of the ID of the task will keep the temporary files for the matching pattern irrespective of the task success or failure.



来源:https://stackoverflow.com/questions/8328818/how-to-tell-hadoop-to-not-delete-temporary-directory-from-hdfs-when-task-is-kill

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!