In Hadoop where does the framework save the output of the Map task in a normal Map-Reduce Application?

后端 未结 3 1857
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-06 08:38

I am trying to find out where does the output of a Map task is saved to disk before it can be used by a Reduce task.

Note: - version used is Hadoop 0.20

相关标签:
3条回答
  • 2021-02-06 09:12

    So, I've figured out what is really going on.

    The output of the mapper is buffered until it gets to about 80% of its size, and at that point it begins to dump the result to its local disk and continues to admit items into the buffer.

    I wanted to get the intermediate output of the mapper and use it as input for another job, while the mapper was still running. It turns out that this is not possible without heavily modifying the hadoop 0.20.204 deployment. The way the system works is even after all the things that are specified in the map context:

    map .... {
      setup(context)
      .
      .
      cleanup(context)
    }
    

    and the cleanup is called, there is still no dumping to the temporary folder.

    After, the whole Map computation everything eventually gets merged and dumped to disk and becomes the input for the Shuffling and Sorting stages that precede the Reducer.

    So far from all I've read and looked at, the temporary folder where the output should be eventually, is the one that I was guessing beforehand.

    FileOutputFormat.getWorkOutputPath(context)
    

    I managed to the what I wanted to do in a different way. Anyway any questions there might be about this, let me know.

    0 讨论(0)
  • 2021-02-06 09:19

    Map reduce framework will store intermediate output into local disk rather than HDFS as this would cause unnecessarily replication of files.

    0 讨论(0)
  • 2021-02-06 09:20

    Task tracker starts a separate JVM process for every Map or Reduce task.

    Mapper output (intermediate data) is written to the Local file system (NOT HDFS) of each mapper slave node. Once data transferred to Reducer, We won’t be able to access these temporary files.

    If you what to see your Mapper output, I suggest using IdentityReducer?

    0 讨论(0)
提交回复
热议问题