Hadoop MapReduce intermediate output

前端 未结 2 835
感动是毒
感动是毒 2020-12-06 07:27

Is there a way to output to log the intermediate (Map Phase) output of a MapReduce Job without editing the Application? (The application is not mine, but the cluster is, and

相关标签:
2条回答
  • 2020-12-06 07:57

    keep.task.files.pattern parameter can be used to keep the intermediate files. The intermediate files have to be manually cleaned up once the Job has been completed. Since, this is a map/reduce task property, it has to be set in the configuration file and the jar file packaged again.

    0 讨论(0)
  • 2020-12-06 08:18

    I don't think the MR framework provides any configuration to save intermediate map output files. Even if such a flag exists, it is not very useful because:

    The intermediate output produced by the Maps can't be easily read/used as:
    1) Key Value output is serialized before writing to intermediate files.
    2) Metadata related to Key Value pairs (Key Length, Value Length, Partition#) is also written to these files (this metadata is in binary format)

    An example location of these intermediate files are:
    a) Intermediate Intermediate file (Spill output): /yarn/nm/usercache/root/appcache/application_1525687099554_0008/attempt_1525687099554_0008_m_000000_0_spill_0.out
    b) Final Intermediate file (Merge Output): /yarn/nm/usercache/root/appcache/application_1525687099554_0008/output/attempt_1525687099554_0008_m_000001_0/file.out

    0 讨论(0)
提交回复
热议问题