setup and cleanup methods of Mapper/Reducer in Hadoop MapReduce

前端 未结 5 1755
一生所求
一生所求 2020-12-24 03:10

Are setup and cleanup methods called in each mapper and reducer tasks respectively? Or are they called only once at the start of overall mapper and reducer jobs?

相关标签:
5条回答
  • 2020-12-24 03:23

    One clarification is helpful. The setup/cleanup methods are used for initialization and clean up at task level. Within a task, first initialization happens with a single call to setup() method and then all calls to map() [or reduce()] function will be done. After that another single call will be made to cleanup() method before exiting the task.

    0 讨论(0)
  • 2020-12-24 03:27

    According to the mapreduce documentation setup and cleanup are called for each Mapper and Reducer tasks.

    0 讨论(0)
  • 2020-12-24 03:27

    on the reducer you can on the job do job.setNumReduceTasks(1); and that way the setup and clean-up of the reducer only will be run once.

    0 讨论(0)
  • 2020-12-24 03:43

    It's called per Mapper task or Reducer task. Here is the hadoop code.

    public void run(Context context) throws IOException, InterruptedException {
        setup(context);
        try {
          while (context.nextKey()) {
            reduce(context.getCurrentKey(), context.getValues(), context);
          }
        } finally {
          cleanup(context);
        }
      }
    
    0 讨论(0)
  • 2020-12-24 03:46

    They are called for each task, so if you have 20 mappers running, the setup / cleanup will be called for each one.

    One gotcha is the standard run method for both Mapper and Reducer does not catch exceptions around the map / reduce methods - so if an exception is thrown in these methods, the clean up method will not be called.

    2020 Edit: As noted in the comments, this statement from 2012 (Hadoop 0.20) is no longer true, the cleanup is called as part of a finally block.

    0 讨论(0)
提交回复
热议问题